[GSOC Draft Proposal] ANTLR and Java based D parser for IDE usage

Mon Apr 4 13:16:50 PDT 2011

Thank you for your comments.

Here the updated timeline, I'm always looking for advices:

- April 25 – May 23: Community Bonding Period
  Since I am new in the D community I will spend some time learning how to
  contribute while following the guideline of the community and the DDT project.
  I will start reviewing the language reference asking for clarifications
  when needed.
  Once I have got an overall understanding I will write the production rules of
  a subset of the D grammar(D0) in the ANTLR grammar notation (similar to EBNF).
  Since the AST generation functionality is a key factor for a correct
  integration with DDT, I will enhance the D0 parser with AST construction
  rules from the beginning.
  At this point, I need to discuss with the DDT team the type of AST that has to
  be built for IDEs purposes, and confirm which annotations are most useful
  (eg. source ranges).

- May 23 – July 11: Developing phase I
  A fully functional D0 parser will be integrated in DDT.
  Once the integration is complete I will augment the parser to handle a
  superset of the D1 and D2 grammars.
  To check the correctness of the parser, it will be tested with existing and
  large D code base (like Phobos, Tango, the Andrei's TDPL book source
  code...).
  Subsequently I will modify the tree construction rules to reflect the changes
  in the syntax.

- July 11 – August 15: Developing phase II
  In this phase I will create unit tests to verify the correctness of the
  generated trees and I will focus on the remain aspects of the integration
  with the DDT project.
  In the remaining time I will provide good error recovery to the parser and I
  will improve the overall performance.

- August 15 - August 22: Final phase
  I will use this last week to polish the code and improve the documentation.
  As a final task, I will think about how support for incremental parsing can be
  added in the future.

On 4/4/11, Bruno Medeiros <brunodomedeiros+spam at com.gmail> wrote:
> On 29/03/2011 19:51, Luca Boasso wrote:
>> Timeline
>> --------
>>
>> This is a tentative timeline to be further discussed with the help of the
>> community.
>> I am committed to dedicate substantially to this project knowing that I
>> also
>> have to pass some exams.
>> I estimate that I could spend initially approximately 30h/week.
>> After the exam session I will work full-time on this project.
>>
>> - April 25 – May 23: Community Bonding Period
>>    Since I am new in the D community I will spend some time learning how
>> to
>>    contribute while following the guideline of the community and the
>> DDT project.
>>    I will start reviewing the language reference asking for clarifications
>>    when needed.
>>    Once I have got an overall understanding I will write the production
>> rules of
>>    a superset of the D grammar in the ANTLR grammar notation (similar to
>> EBNF).
>>
>> - May 23 – July 11: Developing phase I
>>    The correctness of the parser is of paramount importance.
>>    I will create many tests to exercise the parser (at this point just a
>>    "recognizer") obtained as output from ANTLR.
>>    Once I am confident with the parser conforms to the language reference
>> and
>>    recognizes the same language as the parser in DMD, I will enhance it
>> with AST
>>    construction rules.
>>    At this point, I need to discuss with the DDT team the type of AST that
>> has to
>>    be built for IDEs purposes, and confirm which annotations are most
>> useful
>>    (eg. source ranges).
>>
>> - July 11 – August 15: Developing phase II
>>    In this phase I will create unit tests to verify the correctness of the
>>    generated trees and I will focus on the integration of the parser with
>> the DDT
>>    project.
>>    In the remaining time I will provide good error recovery to the parser
>> and I
>>    will improve the overall performance.
>>
>> - August 15 - August 22: Final phase
>>    I will use this last week to polish the code and improve the
>> documentation.
>>    As a final task, I will think about how support for incremental parsing
>> can be
>>    added in the future.
>
> In line with my previous comments on the proposal, I have some comments
> regarding the timeline as well. They are somewhat general comments, it
> may not be that worthwhile to go into much detail in the timeline aspect
> unless the proposal is actually accepted.
>
> There is not much point in writing tests for a language-recognizer only
> parser, in other words, a test that only checks if the parser recognizes
> the source as valid or not. We can just feed a lot of existing valid
> source files(like Phobos, Tango, etc.) and check that the parser
> validates it correctly. (That doesn't test the *invalid* syntax cases,
> but that's a less important case for an IDE parser than making sure it
> is correct for the *valid* syntax cases)
>
> The other thing is that AST generation with all the necessary info is
> probably going to be the most significant aspect of this project, in
> terms of effort required. And to implement the AST actions, I suspect it
> might be necessary (or at least desirable) to change the language
> grammar to better suit the actions that generate the AST.
> So with this in mind, I think it would be better that, instead of doing
> a complete D language recognizer first and then adding the AST
> generation functionality, what should be done first is a AST-generating
> parser for a very limited D-like subset language (for example, a
> language with just variable, class, and function/function-parameter
> declarations), and then when we have this, to start expanding the
> grammar until it supports D1/D2 and has all the extra minutiae.
> The point of this is develop a prototype with the essential and more
> difficult aspects of the parser (AST generation, source ranges, some
> error correction) as soon as possible, and the extra stuff afterwards,
> instead of the other way around.
>
> --
> Bruno Medeiros - Software Engineer
>