Writing a Parser - Walnut and aPaGeD comments
Jascha Wetzel
firstname at mainia.de
Sat Jan 12 04:55:25 PST 2008
Dan wrote:
>> That depends on the grammar that is being parsed. Simple grammars can
>> often be parsed faster with hand-optimized parsers. The more complex the
>> grammar is, the less impact the generated parsers' overhead has.
>
> Ah, but there's always overhead.
What i meant was, that this overhead has a purpose. Once your grammar
uses it, it's not overhead anymore.
Automatically generated parsers usually facilitate bottom-up parsing
algorithms (LALR, GLR) that have several advantages over top-down
algorithms (LL(k), usually implemented as recursive descent) that are
used in manually written parsers.
LALR for example is completely iterative (i.e. no stack action, no
calls, less code -> completely cached), making it more efficient than
recursive descent. GLR is almost completely iterative, depending on the
implementation.
If an automatically generated parser is slower, it is usually due to
features that make it more flexible wrt. to grammars. Therefore, as
stated initially, once you need this flexibility, the parser performs
nicely.
Another thing is backtracking. RD can use lookahead, but it is more
elaborate to implement it manually and generally, top-down lookahead
isn't as effective as bottom-up lookahead. That is, it takes more
complex grammars to create the need for backtracking in a bottom-up
parser than in a top-down parser. That is where bottom-up parsing is
categorically faster than top-down.
All this makes me claim that it'll be hard to write for example a
recursive descent D parser by hand, that performs better than it's
automatically generated GLR counterpart. You might still succeed by a
small margin, but you will have spent a *lot* more time to do so. And
you will have a parser that is not near as easily maintainable as the
grammar used for the automatically generated parser.
>> Apaged was tailored for parsing D, and it's very fast for that. Last
>> time i checked, it parsed the complete Tango package in less than 2
>> seconds (including disk io).
>
> Eep. That would instantly make any JavaScript interpreter a failure; scripts need to *run* in the < 500ms (unnoticeable) range to even be considered.
The tango package has >400 files with multiple MB of code. I doubt there
is any script or library written in JavaScript that is even close to
that size. Parsing single files <1000 LoC is a matter of <1ms with
apaged or any other LALR/GLR parser, no need to worry ;).
> So you mean to say, it does it by looking at the tokens before and after and inferring the value based on whether an operator or operand is expected?
>
> That makes sense. I did it similar to that.
In this case it even only needs to look at the next token. You're
usually right with what you do to solve such a case if you don't need to
backtrack. If so, you should double check that there is no other way.
More information about the Digitalmars-d-learn
mailing list