DDMD and such.

Thu Sep 29 03:28:03 PDT 2011

On Wed, 28 Sep 2011 22:59:45 +0200, Jonathan M Davis <jmdavisProg at gmx.com>  
wrote:

> On Wednesday, September 28, 2011 13:43 Nick Sabalausky wrote:
>> "Jonathan M Davis" <jmdavisProg at gmx.com> wrote in message
>> news:mailman.261.1317239287.26225.digitalmars-d at puremagic.com...
>>
>> > I would point out that there is an intention to eventually get a D  
>> lexer
>> > and
>> > parser into Phobos so that tools can take advantage of them. Those  
>> could
>> > eventually lead to a frontend in D but would provide benefits far  
>> beyond
>> > simply
>> > having the compiler in D.
>>
>> Is the interest more in a D-specific lexer/parser or a generalized one?  
>> Or
>> is it more of a split vote? I seem to remember interest both ways, but I
>> don't know whether there's any consensus among the DMD/Phobos crew.
>>
>> A generalized lexer is nothing more than a regex engine that has more  
>> than
>> one distinct accept state (which then gets run over and over until EOF).
>> And the FSM is made simply by doing a combined regex "(regexForToken1 |
>> regexForToken2 | regexForToken3 | ... )", and then each of those parts
>> just get their own accept state. Which makes me wonder...
>>
>> There was a GSoC project to overhaul Phobos's regex engine, wasn't  
>> there?
>> Is that done? Is it designed in a way that the stuff above wouldn't be
>> real hard to add?
>>
>> And what about algoritm? Is it a Thompson NFA, ie, it traverses the NFA  
>> as
>> if it were a DFA, effectively "creating" the DFA on-the-fly)? Or does it
>> just traverse the NFA as an NFA? Or does it create an actual DFA and
>> traverse that? An actual DFA would probably be best for a lexer. If a  
>> DFA,
>> is it an optimized DFA? In my (limited) tests, it didn't seem like
>> DFA-optimization would yield a notable benefit on typical
>> programming-langauge tokens. It seems to be more suited to pathological
>> cases.
>
> There is some desire to have a lexer and parser in Phobos which  
> basically have
> the same implementation as dmd (only in D instead of C++). That way,  
> they're
> very close to the actual compiler, and it's easy to port fixes and
> improvements between the two.
>
> However, we definitely also want a more general lexer/parser generator  
> which
> takes advantage of D's metaprogramming capabalities. Andrei was pushing  
> more
> for that and doesn't really like the idea of the other, since it would  
> reduce
> the desire to produce the more general solution. So, this _is_ some  
> dissension
> on the matter. But there's definitely room for both. It's just a  
> question of
> time and manpower.
>
> - Jonathan M Davis

What's currently missing to write lexers/parsers is an approach for range  
based file reading
with lookahead. Steven seems to work on a new stdio which tries to solve  
this issue.