Let's stop parser Hell

Dmitry Olshansky dmitry.olsh at gmail.com
Sat Jul 7 13:27:12 PDT 2012


On 08-Jul-12 00:09, Andrei Alexandrescu wrote:
> On 7/7/12 3:17 PM, Dmitry Olshansky wrote:
>> I'll have to point out that the whole point about integrated lexing is
>> moot. It's more of liability then benefit. At very least it's just
>> implementation curiosity not advantage.
>
> Interesting. I'll have to ask for more details on why.
>

I think I told that before but anyway the main point is:
the reason lexer exists is performance.

It's obvious that one can write grammar without ever using lexer. If the 
notation is good say regex or the one used in PEG (that is not quite the 
same it turns out) then token can be easily describe in grammar. Once we 
got lexer can give only 2 things:
	-manageability, it's jsut splitting grammar in 2 parts that could be 
maintained separately (in fact some "lexers" are not DFA nor regular)
(sometimes it could go the opposite direction - it could be  better to 
have one integrated grammar)
	- speed. The reason it could be faster is because lexer can use highly 
deterministic grammar.

Now back to PEGs and packrats. What I see everywhere I look is 
essentially integration of a backtracking-NFA lexer, that is not fast 
nor is particularly convenient. The notation is the whole other matter.

So my view on the matter: whether or not lexer is integrated is two-fold 
question: notation vs implementation.

E.g. it may make regular lexer-like things a part of notation thus 
having rules like:
	Identifier  -> Alpha (Alpha|Number|_)*

Yet it may or may use the same engine for _all_ _rules_, in fact if 
implementation optimize things by ripping off regular parts of grammar 
to small DFA engines it would make some kind of lexer behind the scenes!

Anyway highly hybrid parsers are all the rage now, and reasons are 
obvious - they runs fast and can be bend by some magic to quite 
convoluted grammars of modern languages ;)

-- 
Dmitry Olshansky




More information about the Digitalmars-d mailing list