DDMD and such.

Wed Sep 28 14:22:17 PDT 2011

I'd gladly help you with this. I'm very excited to have a parser in Phobos!

On Thu, Sep 29, 2011 at 1:20 AM, Nick Sabalausky <a at a.a> wrote:
> "Jonathan M Davis" <jmdavisProg at gmx.com> wrote in message
> news:mailman.271.1317243599.26225.digitalmars-d at puremagic.com...
>> On Wednesday, September 28, 2011 13:43 Nick Sabalausky wrote:
>>> "Jonathan M Davis" <jmdavisProg at gmx.com> wrote in message
>>> news:mailman.261.1317239287.26225.digitalmars-d at puremagic.com...
>>>
>>> > I would point out that there is an intention to eventually get a D
>>> > lexer
>>> > and
>>> > parser into Phobos so that tools can take advantage of them. Those
>>> > could
>>> > eventually lead to a frontend in D but would provide benefits far
>>> > beyond
>>> > simply
>>> > having the compiler in D.
>>>
>>> Is the interest more in a D-specific lexer/parser or a generalized one?
>>> Or
>>> is it more of a split vote? I seem to remember interest both ways, but I
>>> don't know whether there's any consensus among the DMD/Phobos crew.
>>>
>>> A generalized lexer is nothing more than a regex engine that has more
>>> than
>>> one distinct accept state (which then gets run over and over until EOF).
>>> And the FSM is made simply by doing a combined regex "(regexForToken1 |
>>> regexForToken2 | regexForToken3 | ... )", and then each of those parts
>>> just get their own accept state. Which makes me wonder...
>>>
>>> There was a GSoC project to overhaul Phobos's regex engine, wasn't there?
>>> Is that done? Is it designed in a way that the stuff above wouldn't be
>>> real hard to add?
>>>
>>> And what about algoritm? Is it a Thompson NFA, ie, it traverses the NFA
>>> as
>>> if it were a DFA, effectively "creating" the DFA on-the-fly)? Or does it
>>> just traverse the NFA as an NFA? Or does it create an actual DFA and
>>> traverse that? An actual DFA would probably be best for a lexer. If a
>>> DFA,
>>> is it an optimized DFA? In my (limited) tests, it didn't seem like
>>> DFA-optimization would yield a notable benefit on typical
>>> programming-langauge tokens. It seems to be more suited to pathological
>>> cases.
>>
>> There is some desire to have a lexer and parser in Phobos which basically
>> have
>> the same implementation as dmd (only in D instead of C++). That way,
>> they're
>> very close to the actual compiler, and it's easy to port fixes and
>> improvements between the two.
>>
>> However, we definitely also want a more general lexer/parser generator
>> which
>> takes advantage of D's metaprogramming capabalities. Andrei was pushing
>> more
>> for that and doesn't really like the idea of the other, since it would
>> reduce
>> the desire to produce the more general solution. So, this _is_ some
>> dissension
>> on the matter. But there's definitely room for both. It's just a question
>> of
>> time and manpower.
>>
>
> Boy, I gotta say I'm really tempted to tackle this. I don't know if I
> *should* dedicate my already-tight time, but it's very tempting. And I have
> already written a generalized lexer generator in D (
> www.semitwist.com/goldie ), so I have that experience (and codebase) to draw
> upon.
>
> Only big question is whether it would be best to try to make Phobos's
> existing regex engine flexible enough that it could be used by the lexer
> (since a generalized lexer is essentially a regex engine with multiple
> accept states, and optionally some customizable hooks). I've posted some
> questions to that end in another branch of this thread.
>
>
>
>
>
>