Let's stop parser Hell

Roman D. Boiko rb at d-coding.com
Thu Jul 5 13:02:15 PDT 2012


On Thursday, 5 July 2012 at 19:54:39 UTC, Philippe Sigaud wrote:
> On Thu, Jul 5, 2012 at 8:28 PM, Andrei Alexandrescu
> <SeeWebsiteForEmail at erdani.org> wrote:
>
>> I'll be glad to buy for you any book you might feel you need 
>> for this.
>> Again, there are few things more important for D right now 
>> than exploiting
>> its unmatched-by-competition features to great ends. I don't 
>> want the lack
>> of educational material to hold you down. Please continue 
>> working on this
>> and let me know of what you need.
>
> That's nice of you, if a bit extreme for a public mailing list 
> :)
> Andrei, money is no problem :)
> Interest in the field of parsing is no problem.
> Interest in D future is no problem.
> Having a demanding job, and three children, is a problem. No, 
> scratch
> that, you know what I mean.

I have four, from 1 to 7 years old... Wouldn't call them a 
problem, though :)))

> But hey, Roman is doing interesting things on keyword parsing 
> right
> now, and changing the parser generated by Pegged is not 
> difficult. We
> will see where this thread lead. (Roman, you should send your 
> results
> here, because I'm still taken aback by the built-in AA speed 
> compared
> to linear array look-up for 100 keywords).

Well, I wouldn't call those "results" yet. Just some benchmarks. 
Here they are:

isKeyword_Dummy (baseline): 427 [microsec] total, 28 [nanosec / 
lookup].
isKeyword_Dictionary: 1190 [microsec] total, 214 [nanosec / 
lookup].
isKeyword_SwitchByLengthThenByChar: 466 [microsec] total, 83 
[nanosec / lookup].
isKeyword_BinaryArrayLookup: 2722 [microsec] total, 490 [nanosec 
/ lookup].
isKeyword_LinearArrayLookup: 13822 [microsec] total, 2490 
[nanosec / lookup].
isKeyword_UnicodeTrie: 1317 [microsec] total, 237 [nanosec / 
lookup].
isKeyword_UnicodeTrieBoolLookup: 1072 [microsec] total, 193 
[nanosec / lookup].
Total: 22949 identifiers + 5551 keywords.

isKeyword_Dummy (baseline): 2738 [microsec] total, 50 [nanosec / 
lookup].
isKeyword_Dictionary: 4247 [microsec] total, 242 [nanosec / 
lookup].
isKeyword_SwitchByLengthThenByChar: 1593 [microsec] total, 91 
[nanosec / lookup].
isKeyword_BinaryArrayLookup: 14351 [microsec] total, 820 [nanosec 
/ lookup].
isKeyword_LinearArrayLookup: 59564 [microsec] total, 3405 
[nanosec / lookup].
isKeyword_UnicodeTrie: 4167 [microsec] total, 238 [nanosec / 
lookup].
isKeyword_UnicodeTrieBoolLookup: 3466 [microsec] total, 198 
[nanosec / lookup].
Total: 104183 identifiers + 17488 keywords.

> As Dmitry said, we might hit a CTFE wall: memory consumption,
> computation speed, ...
> (*channelling Andrei*: then we will correct whatever makes CTFE 
> a
> problem. Right)
>
> Philippe
>
> (Hesitating between 'The Art of the Metaobject Protocol' and
> 'Compilers, Techniques and Tools', right now)




More information about the Digitalmars-d mailing list