Let's stop parser Hell

Sat Jul 7 15:24:59 PDT 2012

On Saturday, 7 July 2012 at 22:07:02 UTC, Roman D. Boiko wrote:
> On Saturday, 7 July 2012 at 21:52:09 UTC, David Piepgrass wrote:
>> it seems easier to tell what the programmer "meant" with three 
>> phases, in the face of errors. I mean, phase 2 can tell when 
>> braces and parenthesis are not matched up properly and then it 
>> can make reasonable guesses about where those missing 
>> braces/parenthesis were meant to be, based on things like 
>> indentation. That would be especially helpful when the parser 
>> is used in an IDE, since if the IDE guesses the intention 
>> correctly, it can still understand broken code and provide 
>> code completion for it. And since phase 2 is a standard tool, 
>> anybody's parser can use it.
>
> There could be multiple errors that compensate each other and 
> make your phase 2 succeed and prevent phase 3 from doing proper 
> error handling. Even knowing that there is an error, in many 
> cases you would not be able to create a meaningful error 
> message. And any error would make your phase-2 tree incorrect, 
> so it would be difficult to recover from it by inserting an 
> additional token or ignoring tokens until parser is able to 
> continue its work properly. All this would suffer for the same 
> reason: you loose information.

This is all true, but forgetting a brace commonly results in a 
barrage of error messages anyway. Code that guesses what you 
meant obviously won't work all the time, and phase 3 would have 
to take care not to emit an error message about a "{" token that 
doesn't actually exist (that was merely "guessed-in"). But at 
least it's nice for a parser to be /able/ to guess what you 
meant; for a typical parser it would be out of the question, upon 
detecting an error, to back up four source lines, insert a brace 
and try again.