DDMD and such.

Wed Sep 28 14:40:23 PDT 2011

On 29.09.2011 0:43, Nick Sabalausky wrote:
> "Jonathan M Davis"<jmdavisProg at gmx.com>  wrote in message
> news:mailman.261.1317239287.26225.digitalmars-d at puremagic.com...
>>
>> I would point out that there is an intention to eventually get a D lexer
>> and
>> parser into Phobos so that tools can take advantage of them. Those could
>> eventually lead to a frontend in D but would provide benefits far beyond
>> simply
>> having the compiler in D.
>>
>
> Is the interest more in a D-specific lexer/parser or a generalized one? Or
> is it more of a split vote? I seem to remember interest both ways, but I
> don't know whether there's any consensus among the DMD/Phobos crew.
>
> A generalized lexer is nothing more than a regex engine that has more than
> one distinct accept state (which then gets run over and over until EOF). And
> the FSM is made simply by doing a combined regex  "(regexForToken1 |
> regexForToken2 | regexForToken3 | ... )", and then each of those parts just
> get their own accept state. Which makes me wonder...
>
> There was a GSoC project to overhaul Phobos's regex engine, wasn't there? Is
> that done? Is it designed in a way that the stuff above wouldn't be real
> hard to add?

That was me. And, yes, that's exactly the kind of thing I'm pursing 
right now. Lexer with compile-time generated FSM looks very promising 
e.g. for DSELs.
In fact, I think I already have good vision on how to reuse existing 
code to get there.

>
> And what about algoritm? Is it a Thompson NFA, ie, it traverses the NFA as
> if it were a DFA, effectively "creating" the DFA on-the-fly)?

As for implementation there are two algorithms operating on the same IR. 
The default one is Thompson NFA, so yes with some work I can make it 
generate code for DFA instead of executing. It's just till now I 
generated more simple structures from it e.g. bit-parallel prefix searcher.

  Or does it
> just traverse the NFA as an NFA? Or does it create an actual DFA and
> traverse that? An actual DFA would probably be best for a lexer. If a DFA,
> is it an optimized DFA? In my (limited) tests, it didn't seem like
> DFA-optimization would yield a notable benefit on typical
> programming-langauge tokens. It seems to be more suited to pathological
> cases.
>

I'm also dubious if DFA optimization techniques would worth the hassle. 
Not a top priority.

-- 
Dmitry Olshansky