Looking for champion - std.lang.d.lex

Walter Bright newshound2 at digitalmars.com
Mon Oct 25 20:48:30 PDT 2010


Nick Sabalausky wrote:
> "Walter Bright" <newshound2 at digitalmars.com> wrote in message 
> news:ia59si$1r0j$1 at digitalmars.com...
>> Consider a string literal, say "abc\"def". With Goldie's method, I infer 
>> this string has to be scanned twice. Once to find its limits, and the 
>> second to convert it to the actual string.
> 
> Yea, that is true. With that string in the input, the value given to the 
> user code will be:
> 
> assert(tokenObtainedFromGoldie.toString() == q{"abc\"def"});
> 
> That's a consequence of the grammar being separated from lexing/parsing 
> implementation.
> 
> You're right that that does seem less than ideal. Although I'm not sure how 
> to remedy that without loosing the independence between grammar and 
> lex/parse implementation that is the main point of the GOLD-based style.
> 
> But there's something I don't quite understand about the approach you're 
> suggesting: You seem to be suggesting that a terminal be progressively 
> converted into its final form *as* it's still in the process of being 
> recognized by the DFA. Which means, you don't know *what* you're supposed to 
> be converting it into *while* you're converting it. Which means, you have to 
> be speculatively converting it into all types of tokens that the current DFA 
> state could possibly be on its way towards accepting (also, the DFA would 
> need to contain a record of possible terminals for each DFA state). And then 
> the result is thrown away if it turns out to be a different terminal. Is 
> this correct? If so, is there generally enough lexical difference between 
> the terminals that need such treatment to compensate for the extra 
> processing needed in situations that are closer to worst-case (that is, in 
> comparison to Goldie's current approach)?

Probably that's why I don't use lexer generators. Building lexers is the 
simplest part of building a compiler, and I've always been motivated by trying 
to make it as fast as possible.

To specifically answer your question, yes, in the lexers I make, you know you're 
parsing a string, so you process it as you parse it.



> If all of that is so, then what would be your thoughts on this approach?:
> 
> Suppose Goldie had a way to associate an optional "simultaneous/lockstep 
> conversion" to a type of terminal. For instance:
> 
> myLanguage.associateConversion("StringLiteral", new 
> StringLiteralConverter());
> 
> Then, 'StringLiteralConverter' would be something that could be either 
> user-provided or offered by Goldie (both ways would be supported). It would 
> be some sort of class or something that had three basic functions:
> 
> class StringLiteralConverter : ITerminalConverter
> {
>     void process(dchar c) {...}
> 
>     // Or maybe this to make it possible to minimize allocations
>     // in certain circumstances by utilizing slices:
>     void process(dchar c, size_t indexIntoSource, string fullOrignalSource) 
> {...}
> 
>     Variant emit() {...}
>     void clear() {...}
> }
> 
> Each state in the lexer's DFA would know which terminals it could possibly 
> be processing. And for each of those terminals that has an associated 
> converter, the lexer will call 'process()'. If a terminal is accepted, 
> 'emit' is called to get the final result (and maybe do any needed 
> finalization first), and then 'clear' is called on all converters that had 
> been used.
> 
> This feature would preclude the use of the actual "GOLD Parser Builder" 
> program, but since I'm writing a tool to handle that functionality anyway, 
> I'm not too concerned about that.
> 
> Do you think that would work? Would its benefits be killed by the overhead 
> introduced? If so, could those overheads be sufficiently reduced without 
> scrapping the general idea?

I don't know. I'd have to study the issue for a while. I suggest taking a look 
at dmd's lexer and compare. I'm not sure what Spirit's approach to this is.


>> What Goldie will be compared against is Spirit. Spirit is a reasonably 
>> successful add-on to C++. Goldie doesn't have to do things the same way as 
>> Spirit (expression templates - ugh), but it should be as easy to use and 
>> at least as powerful.
>>
> 
> Understood.


More information about the Digitalmars-d mailing list