[GSOC Draft Proposal] ANTLR and Java based D parser for IDE usage

Mon Apr 4 10:26:08 PDT 2011

On 29/03/2011 19:51, Luca Boasso wrote:
> Timeline
> --------
>
> This is a tentative timeline to be further discussed with the help of the
> community.
> I am committed to dedicate substantially to this project knowing that I also
> have to pass some exams.
> I estimate that I could spend initially approximately 30h/week.
> After the exam session I will work full-time on this project.
>
> - April 25 – May 23: Community Bonding Period
>    Since I am new in the D community I will spend some time learning how to
>    contribute while following the guideline of the community and the
> DDT project.
>    I will start reviewing the language reference asking for clarifications
>    when needed.
>    Once I have got an overall understanding I will write the production rules of
>    a superset of the D grammar in the ANTLR grammar notation (similar to EBNF).
>
> - May 23 – July 11: Developing phase I
>    The correctness of the parser is of paramount importance.
>    I will create many tests to exercise the parser (at this point just a
>    "recognizer") obtained as output from ANTLR.
>    Once I am confident with the parser conforms to the language reference and
>    recognizes the same language as the parser in DMD, I will enhance it with AST
>    construction rules.
>    At this point, I need to discuss with the DDT team the type of AST that has to
>    be built for IDEs purposes, and confirm which annotations are most useful
>    (eg. source ranges).
>
> - July 11 – August 15: Developing phase II
>    In this phase I will create unit tests to verify the correctness of the
>    generated trees and I will focus on the integration of the parser with the DDT
>    project.
>    In the remaining time I will provide good error recovery to the parser and I
>    will improve the overall performance.
>
> - August 15 - August 22: Final phase
>    I will use this last week to polish the code and improve the documentation.
>    As a final task, I will think about how support for incremental parsing can be
>    added in the future.

In line with my previous comments on the proposal, I have some comments 
regarding the timeline as well. They are somewhat general comments, it 
may not be that worthwhile to go into much detail in the timeline aspect 
unless the proposal is actually accepted.

There is not much point in writing tests for a language-recognizer only 
parser, in other words, a test that only checks if the parser recognizes 
the source as valid or not. We can just feed a lot of existing valid 
source files(like Phobos, Tango, etc.) and check that the parser 
validates it correctly. (That doesn't test the *invalid* syntax cases, 
but that's a less important case for an IDE parser than making sure it 
is correct for the *valid* syntax cases)

The other thing is that AST generation with all the necessary info is 
probably going to be the most significant aspect of this project, in 
terms of effort required. And to implement the AST actions, I suspect it 
might be necessary (or at least desirable) to change the language 
grammar to better suit the actions that generate the AST.
So with this in mind, I think it would be better that, instead of doing 
a complete D language recognizer first and then adding the AST 
generation functionality, what should be done first is a AST-generating 
parser for a very limited D-like subset language (for example, a 
language with just variable, class, and function/function-parameter 
declarations), and then when we have this, to start expanding the 
grammar until it supports D1/D2 and has all the extra minutiae.
The point of this is develop a prototype with the essential and more 
difficult aspects of the parser (AST generation, source ranges, some 
error correction) as soon as possible, and the extra stuff afterwards, 
instead of the other way around.

-- 
Bruno Medeiros - Software Engineer