DDMD and such.

Wed Sep 28 14:03:56 PDT 2011

"Jonathan M Davis" <jmdavisProg at gmx.com> wrote in message 
news:mailman.271.1317243599.26225.digitalmars-d at puremagic.com...
> On Wednesday, September 28, 2011 13:43 Nick Sabalausky wrote:
>> "Jonathan M Davis" <jmdavisProg at gmx.com> wrote in message
>> news:mailman.261.1317239287.26225.digitalmars-d at puremagic.com...
>>
>> > I would point out that there is an intention to eventually get a D 
>> > lexer
>> > and
>> > parser into Phobos so that tools can take advantage of them. Those 
>> > could
>> > eventually lead to a frontend in D but would provide benefits far 
>> > beyond
>> > simply
>> > having the compiler in D.
>>
>> Is the interest more in a D-specific lexer/parser or a generalized one? 
>> Or
>> is it more of a split vote? I seem to remember interest both ways, but I
>> don't know whether there's any consensus among the DMD/Phobos crew.
>>
>> A generalized lexer is nothing more than a regex engine that has more 
>> than
>> one distinct accept state (which then gets run over and over until EOF).
>> And the FSM is made simply by doing a combined regex "(regexForToken1 |
>> regexForToken2 | regexForToken3 | ... )", and then each of those parts
>> just get their own accept state. Which makes me wonder...
>>
>> There was a GSoC project to overhaul Phobos's regex engine, wasn't there?
>> Is that done? Is it designed in a way that the stuff above wouldn't be
>> real hard to add?
>>
>> And what about algoritm? Is it a Thompson NFA, ie, it traverses the NFA 
>> as
>> if it were a DFA, effectively "creating" the DFA on-the-fly)? Or does it
>> just traverse the NFA as an NFA? Or does it create an actual DFA and
>> traverse that? An actual DFA would probably be best for a lexer. If a 
>> DFA,
>> is it an optimized DFA? In my (limited) tests, it didn't seem like
>> DFA-optimization would yield a notable benefit on typical
>> programming-langauge tokens. It seems to be more suited to pathological
>> cases.
>
> There is some desire to have a lexer and parser in Phobos which basically 
> have
> the same implementation as dmd (only in D instead of C++). That way, 
> they're
> very close to the actual compiler, and it's easy to port fixes and
> improvements between the two.

The lexer seems like something that would change only on rare occasions. Am 
I wrong?

>
> However, we definitely also want a more general lexer/parser generator 
> which
> takes advantage of D's metaprogramming capabalities. Andrei was pushing 
> more
> for that and doesn't really like the idea of the other, since it would 
> reduce
> the desire to produce the more general solution. So, this _is_ some 
> dissension
> on the matter. But there's definitely room for both. It's just a question 
> of
> time and manpower.
>

I see.