Feedback needed: Complete symbol appoach for Bison's D backend

Sat Nov 14 15:50:23 UTC 2020

Hello!

I need some feedback about the return value of yylex() in Bison's 
Lexer class, which must be provided by the user.

This method should provide the Bison parser with three values: 
the TokenKind (which is the current return value), the semantic 
value, and the location (optional parameter). The last two are 
set in yylex(), stored in the lexer class, and retrieved by the 
Bison parser through getters.

The other parsers provide the option of complete symbols, which 
means that yylex()'s return value is changed to a structure that 
binds together the TokenKind, the semantic value, and the 
location. Internally, the structure is immediately divided into 
its components, which continue to be used separately throughout 
the parser.

The big advantage of the complete symbol is that it is 
beginner-friendly, and reduces the potential errors caused 
because the user forgot to set one of the values.
The main disadvantage is the possible overhead the structure adds 
to the parser. It will be created and destroyed for each 
discovered token.

Should we keep both versions, or move to a complete symbol 
approach? Given that Bison's current release still has D as an 
experimental feature, this would not be a breaking change. If we 
decide on using both, the complete symbol approach will be 
selected through a Bison directive, like in the other parsers.

An example of the current method, using TokenKind:
https://github.com/akimd/bison/blob/master/examples/d/calc/calc.y#L117

An example using the Symbol struct:
https://github.com/adelavais/bison/blob/complete-external-symbols/examples/d/calc/calc.y#L117