DDMD and such.

Wed Sep 28 13:43:57 PDT 2011

"Jonathan M Davis" <jmdavisProg at gmx.com> wrote in message 
news:mailman.261.1317239287.26225.digitalmars-d at puremagic.com...
>
> I would point out that there is an intention to eventually get a D lexer 
> and
> parser into Phobos so that tools can take advantage of them. Those could
> eventually lead to a frontend in D but would provide benefits far beyond 
> simply
> having the compiler in D.
>

Is the interest more in a D-specific lexer/parser or a generalized one? Or 
is it more of a split vote? I seem to remember interest both ways, but I 
don't know whether there's any consensus among the DMD/Phobos crew.

A generalized lexer is nothing more than a regex engine that has more than 
one distinct accept state (which then gets run over and over until EOF). And 
the FSM is made simply by doing a combined regex  "(regexForToken1 | 
regexForToken2 | regexForToken3 | ... )", and then each of those parts just 
get their own accept state. Which makes me wonder...

There was a GSoC project to overhaul Phobos's regex engine, wasn't there? Is 
that done? Is it designed in a way that the stuff above wouldn't be real 
hard to add?

And what about algoritm? Is it a Thompson NFA, ie, it traverses the NFA as 
if it were a DFA, effectively "creating" the DFA on-the-fly)? Or does it 
just traverse the NFA as an NFA? Or does it create an actual DFA and 
traverse that? An actual DFA would probably be best for a lexer. If a DFA, 
is it an optimized DFA? In my (limited) tests, it didn't seem like 
DFA-optimization would yield a notable benefit on typical 
programming-langauge tokens. It seems to be more suited to pathological 
cases.