[GSoC’11] Lexing and parsing

Wed Mar 23 17:57:02 PDT 2011

On Wed, 23 Mar 2011 13:31:04 -0400, Ilya Pupatenko <pupatenko at gmail.com>  
wrote:

>> I'm not qualified to speak on Spirits internal architecture; I've only
>> used it once for something very simple and ran into a one-liner bug
>> which remains unfixed 7+ years later. But the basic API of Spirit would
>> be wrong for D. “it is possible to write a highly-integrated
>> lexer/perser generator in D without resorting to additional tools” does
>> not mean "the library should allow programmer to write grammar directly
>> in D (ideally, the syntax should be somehow similar to EBNF)" it means
>> that the library should allow you to write a grammar in EBNF and then
>> through a combination of templates, string mixins and compile-time
>> function evaluation generate the appropriate (hopefully optimal) parser.
>> D's compile-time programming abilities are strong enough to do the code
>> generation job usually left to separate tools. Ultimately a user of the
>> library should be able to declare a parser something like this:
>>
>> // Declare a parser for Wikipedia's EBNF sample language
>> Parser!`
>> (* a simple program syntax in EBNF − Wikipedia *)
>> program = 'PROGRAM' , white space , identifier , white space ,
>>             'BEGIN' , white space ,
>>             { assignment , ";" , white space } ,
>>             'END.' ;
>> identifier = alphabetic character , { alphabetic character | digit } ;
>> number = [ "-" ] , digit , { digit } ;
>> string = '"' , { all characters − '"' } , '"' ;
>> assignment = identifier , ":=" , ( number | identifier | string ) ;
>> alphabetic character = "A" | "B" | "C" | "D" | "E" | "F" | "G"
>>                       | "H" | "I" | "J" | "K" | "L" | "M" | "N"
>>                       | "O" | "P" | "Q" | "R" | "S" | "T" | "U"
>>                       | "V" | "W" | "X" | "Y" | "Z" ;
>> digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
>> white space = ? white space characters ? ;
>> all characters = ? all visible characters ? ;
>> ` wikiLangParser;
>
> Ok, it sounds good. But still in most cases we are not interesting only  
> if input text match specified grammar. We want to perform some semantic  
> actions while parsing, for example build some kind of AST, evaluate an  
> expression and so on. But I have no idea how can I ask this parser to  
> perform user-defined actions for example for 'string' and 'number'  
> "nodes" in this case.

I don't have any experience with using parser generators, but using arrays  
of delegates works really well for GUI libraries. For example:

wikiLangParser.digit ~= (ref wikiLangParser.Token digit) {
	auto tokens = digit.tokens;
	assert(tokens.length == 1);
	digit.value = 0 + (token.front.value.get!string.front - '0');
}

wikiLangParser.number ~= (ref wikiLangParser.Token number) {
	auto tokens = number.tokens;
	assert(!tokens.empty);

	bool negative = false
	if(tokens.front.get!string == "-") {
		negative = true;
		tokens.popFront;
	}

	int value = 0;
	foreach(token; tokens) {
		value = value * 10 + token.value.get!int;
	}

	if(negative)
		value = -value;

	number.value = value;
}

debug {
	wikiLangParser.number ~= (ref wikiLangParser.Token number) {
		writeln("Parsed number (",number.value,")");
	}
}