[GSoC’11] Lexing and parsing

Wed Mar 23 05:22:10 PDT 2011

On 03/23/2011 05:39 AM, Robert Jacques wrote:
> On Tue, 22 Mar 2011 18:27:51 -0400, Ilya Pupatenko <pupatenko at gmail.com> wrote:
>
>> Hi,
>>
>> First of all, I want to be polite so I have to introduce myself (you can skip
>> this paragraph if you feel tired of newcomer-students’ posts). My name is
>> Ilya, I’m a Master student of IT department of Novosibirsk State University
>> (Novosibirsk, Russia). In Soviet period Novosibirsk became on of the most
>> important science center in the country and now there are very close
>> relations between University and Academy of Science. That’s why it’s
>> difficult and very interesting to study here. But I’m not planning to study
>> or work this summer, so I’ll be able to work (nearly) full time on GSoC
>> project. My primary specialization is seismic tomography inverse problems,
>> but I’m also interested in programming language implementation and
>> compilation theory. I have good knowledge of C++ and C# languages and
>> “intermediate” knowledge of D language, knowledge of compilation theory, some
>> experience in implementing lexers, parsers and translators, basic knowledge
>> of lex/yacc/antlr and some knowledge of Boost.Spirit library. I’m not an
>> expert in D now, but I willing to learn and to solve difficult tasks, that’s
>> why I decided to apply on the GSoC.
>>
>> I’m still working on my proposal (on task “Lexing and Parsing”), but I want
>> to write some general ideas and ask some questions.
>>
>> 1. It is said that “it is possible to write a highly-integrated lexer/perser
>> generator in D without resorting to additional tools”. As I understand, the
>> library should allow programmer to write grammar directly in D (ideally, the
>> syntax should be somehow similar to EBNF) and the resulting parser will be
>> generated by D compiler while compiling the program. This method allows
>> integration of parsing in D code; it can make code simpler and even sometimes
>> more efficient.
>> There is a library for C++ (named Boost.Spirit) that follows the same idea.
>> It provide (probably not ideal but very nice) “EBNF-like” syntax to write a
>> grammar, it’s quite powerful, fast and flexible. There are three parts in
>> this library (actually there are 4 parts but we’re not interested in
>> Spirit.Classic now):
>> • Spirit.Qi (parser library that allows to build recursive descent parsers);
>> • Spirit.Karma (generator library);
>> • Spirit.Lex (library usable to create tokenizers).
>> The Spirit library uses “C++ template black magic” heavily (for example, via
>> Boost.Fusion). But D has greater metaprogramming abilities, so it is possible
>> to implement the same functionality in easier and “clean” way.
>> So, the question is: is it a good idea if at least parser library
>> architecture will be somewhat similar to Spirit one? Of course it is not
>> about “blind” copying; but creating architecture for such a big system
>> completely from scratch is quite difficult indeed. If to be exact, I like an
>> idea of parser attributes, I like the way semantic actions are described, and
>> the “auto-rules” seems really useful.
>
> I'm not qualified to speak on Spirits internal architecture; I've only used it
> once for something very simple and ran into a one-liner bug which remains
> unfixed 7+ years later. But the basic API of Spirit would be wrong for D. “it
> is possible to write a highly-integrated lexer/perser generator in D without
> resorting to additional tools” does not mean "the library should allow
> programmer to write grammar directly in D (ideally, the syntax should be
> somehow similar to EBNF)" it means that the library should allow you to write a
> grammar in EBNF and then through a combination of templates, string mixins and
> compile-time function evaluation generate the appropriate (hopefully optimal)
> parser. D's compile-time programming abilities are strong enough to do the code
> generation job usually left to separate tools. Ultimately a user of the library
> should be able to declare a parser something like this:
>
> // Declare a parser for Wikipedia's EBNF sample language
> Parser!`
> (* a simple program syntax in EBNF − Wikipedia *)
> program = 'PROGRAM' , white space , identifier , white space ,
> 'BEGIN' , white space ,
> { assignment , ";" , white space } ,
> 'END.' ;
> identifier = alphabetic character , { alphabetic character | digit } ;
> number = [ "-" ] , digit , { digit } ;
> string = '"' , { all characters − '"' } , '"' ;
> assignment = identifier , ":=" , ( number | identifier | string ) ;
> alphabetic character = "A" | "B" | "C" | "D" | "E" | "F" | "G"
> | "H" | "I" | "J" | "K" | "L" | "M" | "N"
> | "O" | "P" | "Q" | "R" | "S" | "T" | "U"
> | "V" | "W" | "X" | "Y" | "Z" ;
> digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
> white space = ? white space characters ? ;
> all characters = ? all visible characters ? ;
> ` wikiLangParser;

How does one solve the issues of self/mutual/circular pattern recursion at 
compile-time?

Denis
-- 
_________________
vita es estrany
spir.wikidot.com