Looking for champion - std.lang.d.lex
Nick Sabalausky
a at a.a
Tue Oct 26 21:01:20 PDT 2010
"Walter Bright" <newshound2 at digitalmars.com> wrote in message
news:ia8321$vug$1 at digitalmars.com...
> Nick Sabalausky wrote:
>> "Walter Bright" <newshound2 at digitalmars.com> wrote in message
>> news:i9qd8q$1ls4$1 at digitalmars.com...
>>> 4. the tokens should be a value type, not a reference type
>>
>> I'm curious, is your reason for this purely to avoid allocations during
>> lexing, or are there other reasons too?
>
> It's one big giant reason. Storage allocation gets unbelievably costly in
> a lexer. Another is it makes tokens easy to copy. Another one is that
> classes are for polymorphic behavior. What kind of polymorphic behavior
> would one want with tokens?
>
Honestly, I'm not entirely certain whether or not Goldie actually needs its
tokens to be classes instead of structs, but I'll explain my current usage:
In the basic "dynamic" style, every token in Goldie, terminal or
nonterminal, is of type "class Token" no matter what its symbol or part of
grammar it represents. But Goldie has an optional "static" style which
creates a class hierarchy, for the sake of compile-time type safety, with
"Token" as the base.
Suppose you have a grammar named "simple" that's a series of one or more a's
and b's separated by plus signs:
<Item> ::= 'a' | 'b'
<List> ::= <List> '+' <Item> | <Item>
So there's three terminals: "a", "b", and "+"
And two nonterminals: "<Item>" and "<List>", each with exactly two possible
reductions.
So in dynamic-style, all of those are "class Token", and that's all that
exists. But with the optional static-style, the following class hierarchy is
effectively created:
class Token;
class Token_simple : Token;
class Token_simple!"a" : Token_simple;
class Token_simple!"b" : Token_simple;
class Token_simple!"+" : Token_simple;
class Token_simple!"<Item>" : Token_simple;
class Token_simple!"<List>" : Token_simple;
class Token_simple!("<Item>", "a") : Token_simple!"<Item>";
class Token_simple!("<Item>", "b") : Token_simple!"<Item>";
class Token_simple!("<List>", "<List>", "+", "<Item>") :
Token_simple!"<List>";
class Token_simple!("<List>", "<Item>") : Token_simple!"<List>";
So rules inherit from the nonterminal they reduce to; terminals and
nonterminals inherit from a dummy class dedicated specifically to the given
grammar; and that inherits from plain old dynamic-style "Token". All of
those template parameters are validated at compile-time.
(At some point I'd like to make it possible to specify the rule-based tokens
as something like:
Token!("<List> ::= <List> '+' <Item>"), but I haven't gotten to it yet.)
There's one more trick: The plain old Token exposes a member "subX" which
can be numerically indexed to obtain the sub-tokens (for terminals,
subX.length==0):
void foo(Token token)
{
if(token.matches("<List>", "<List>", "+", "<Item>"))
{
auto leftSide = token.subX[0];
auto rightSide = token.subX[2];
//auto dummy = token.subX[10]; // Run-time error
static assert(is(typeof(leftSide) == Token));
static assert(is(typeof(rightSide) == Token));
}
}
Note that it's impossible for the "matches" function to verify at
compile-time that its arguments are valid.
All of the static-style token types retain all of that for total
compatibility with the dynamic-style. But the static-style nonterminals
provide an additional member, "sub", that can be used like this:
void foo(Token_simple!("<List>", "<List>", "+", "<Item>") token)
{
auto leftSide = token.sub!0;
auto rightSide = token.sub!2;
//auto dummy = token.sub!10; // Compile-time error
static assert(is(typeof(leftSide) == Token_simple!"<List>"));
static assert(is(typeof(rightSide) == Token_simple!"<Item>"));
}
As for whether or not this effect can be reasonably accomplished with
structs: I have no idea, I haven't really looked into it.
More information about the Digitalmars-d
mailing list