[GSoC’11] Lexing and parsing

Ilya Pupatenko pupatenko at gmail.com
Tue Mar 22 15:27:51 PDT 2011


Hi,

First of all, I want to be polite so I have to introduce myself (you can 
skip this paragraph if you feel tired of newcomer-students’ posts). My 
name is Ilya, I’m a Master student of IT department of Novosibirsk State 
University (Novosibirsk, Russia). In Soviet period Novosibirsk became on 
of the most important science center in the country and now there are 
very close relations between University and Academy of Science. That’s 
why it’s difficult and very interesting to study here. But I’m not 
planning to study or work this summer, so I’ll be able to work (nearly) 
full time on GSoC project. My primary specialization is seismic 
tomography inverse problems, but I’m also interested in programming 
language implementation and compilation theory. I have good knowledge of 
C++ and C# languages and “intermediate” knowledge of D language, 
knowledge of compilation theory, some experience in implementing lexers, 
parsers and translators, basic knowledge of lex/yacc/antlr and some 
knowledge of Boost.Spirit library. I’m not an expert in D now, but I 
willing to learn and to solve difficult tasks, that’s why I decided to 
apply on the GSoC.

I’m still working on my proposal (on task “Lexing and Parsing”), but I 
want to write some general ideas and ask some questions.

1. It is said that “it is possible to write a highly-integrated 
lexer/perser generator in D without resorting to additional tools”. As I 
understand, the library should allow programmer to write grammar 
directly in D (ideally, the syntax should be somehow similar to EBNF) 
and the resulting parser will be generated by D compiler while compiling 
the program. This method allows integration of parsing in D code; it can 
make code simpler and even sometimes more efficient.
There is a library for C++ (named Boost.Spirit) that follows the same 
idea. It provide (probably not ideal but very nice) “EBNF-like” syntax 
to write a grammar, it’s quite powerful, fast and flexible. There are 
three parts in this library (actually there are 4 parts but we’re not 
interested in Spirit.Classic now):
• Spirit.Qi (parser library that allows to build recursive descent parsers);
• Spirit.Karma (generator library);
• Spirit.Lex (library usable to create tokenizers).
The Spirit library uses “C++ template black magic” heavily (for example, 
via Boost.Fusion). But D has greater metaprogramming abilities, so it is 
possible to implement the same functionality in easier and “clean” way.
So, the question is: is it a good idea if at least parser library 
architecture will be somewhat similar to Spirit one? Of course it is not 
about “blind” copying; but creating architecture for such a big system 
completely from scratch is quite difficult indeed. If to be exact, I 
like an idea of parser attributes, I like the way semantic actions are 
described, and the “auto-rules” seems really useful.

2. Boost.Spirit is really large and complicated library. And I doubt 
that it is possible to implement library of comparable level in three 
months. That’s why it is extremely important to have a plan (which 
features should be implemented and how much time will it take). I’m 
still working on it but I have some preliminary questions.
Should I have a library that is proposed and accepted in Phobos before 
the end of GSoC? Or there is no such strict timeframe and I can propose 
a library when all features I want to see are implemented and tested well?
And another question. Is it ok to concentrate first on parser library 
and then “move” to other parts? Of course I can choose another part to 
start work on, but it seems to me that parser is most useful and 
interesting part.

3. Finally, what will be next. I’ll try to make a plan (which parts 
should be implemented and when). Then I guess I need to describe the 
proposed architecture in more details, and probably provide some usage 
examples(?). Is it ok, if I publish ideas there to get reviews?
Anyway, I’ll need some time to work on it.

Ilya.

P.S. The funny thing is that I found minor bug in Phobos (#5736) while 
trying (just for fun) to implement some tiny part of Spirit in D. 
Submitting bugs seems to be important part of the task too.



More information about the Digitalmars-d mailing list