Lexer in D

Dmitry Olshansky dmitry.olsh at gmail.com
Sat Mar 2 14:24:40 PST 2013


03-Mar-2013 00:03, Namespace пишет:
>> This is a mess of conditionals is what a direct array of 256 entries
>> would avoid:
>>
>>     int idx;
>>     if (value[0] != '_') {
>>         idx = value[0] - 'a';
>>         if (idx == 26) return false;
>>     } else {
>>         idx = 26;
>>     }
>>
>>     if (idx < 0 || idx > 26) {
>>         // debug writeln("kword: ", idx, ':', value[0]);
>>         return false;
>>     }
>>
>>     if (keywords[idx] is null) return false;
>>
>>     return keywords[idx].canFind(value);
>>
>> Gaining some speed in the process. Plus another layer of array to
>> discern keywords by length. You see why I suggested to generate the
>> code in the first place ? ;)
>>
>> BTW what's the reason to separate keywords and type keywords? They are
>> processed the same in lexer and only parser somewhere up above knows
>> what to do with these regardless. Just return different token values
>> for each.
>
> I changed it and merged them together.
> Also I use now a array of 256 entities, but I must keep the check if idx
> is < 0, because 'D' - 'a' is negative.
> And yes I see what you meant.^^
>

Obviously, you still largely don't :)

Use the full byte to do the lookup dammit. Saving few array slots 
doesn't bring your speed up. Using full byte *directly* as index does 
speed things up because you don't have to do *any* computations and 
*conditionals*.

Conditionals that can easily go both ways are the worst thing 
performance-wise.

I don't know how to phrase that better.

> Code: http://dpaste.1azy.net/317241c0
> I reach still 215 - 230 msecs.

Use profiling to determine what takes the most of the time. If you don't 
like callgrind/cachegrind (I seriously don't know why not use it) try 
dmd's -profile.

Looking at your main code I see some spaghetti code and passing index by 
pointer. Don't do that - encapsulate lexing state in some kind of 
struct, then the index would be implicitly passed to all functions by 
this pointer.

It may very well be the case you are bound by some other code. In any 
case after some heroic efforts Dscanner lexes std.datetime in 24 msecs, 
and that on linux VM powered by the oldish AMD Phenom II. What kind of 
CPU do you use?

-- 
Dmitry Olshansky


More information about the Digitalmars-d-learn mailing list