Looking for champion - std.lang.d.lex
Walter Bright
newshound2 at digitalmars.com
Mon Oct 25 20:48:30 PDT 2010
Nick Sabalausky wrote:
> "Walter Bright" <newshound2 at digitalmars.com> wrote in message
> news:ia59si$1r0j$1 at digitalmars.com...
>> Consider a string literal, say "abc\"def". With Goldie's method, I infer
>> this string has to be scanned twice. Once to find its limits, and the
>> second to convert it to the actual string.
>
> Yea, that is true. With that string in the input, the value given to the
> user code will be:
>
> assert(tokenObtainedFromGoldie.toString() == q{"abc\"def"});
>
> That's a consequence of the grammar being separated from lexing/parsing
> implementation.
>
> You're right that that does seem less than ideal. Although I'm not sure how
> to remedy that without loosing the independence between grammar and
> lex/parse implementation that is the main point of the GOLD-based style.
>
> But there's something I don't quite understand about the approach you're
> suggesting: You seem to be suggesting that a terminal be progressively
> converted into its final form *as* it's still in the process of being
> recognized by the DFA. Which means, you don't know *what* you're supposed to
> be converting it into *while* you're converting it. Which means, you have to
> be speculatively converting it into all types of tokens that the current DFA
> state could possibly be on its way towards accepting (also, the DFA would
> need to contain a record of possible terminals for each DFA state). And then
> the result is thrown away if it turns out to be a different terminal. Is
> this correct? If so, is there generally enough lexical difference between
> the terminals that need such treatment to compensate for the extra
> processing needed in situations that are closer to worst-case (that is, in
> comparison to Goldie's current approach)?
Probably that's why I don't use lexer generators. Building lexers is the
simplest part of building a compiler, and I've always been motivated by trying
to make it as fast as possible.
To specifically answer your question, yes, in the lexers I make, you know you're
parsing a string, so you process it as you parse it.
> If all of that is so, then what would be your thoughts on this approach?:
>
> Suppose Goldie had a way to associate an optional "simultaneous/lockstep
> conversion" to a type of terminal. For instance:
>
> myLanguage.associateConversion("StringLiteral", new
> StringLiteralConverter());
>
> Then, 'StringLiteralConverter' would be something that could be either
> user-provided or offered by Goldie (both ways would be supported). It would
> be some sort of class or something that had three basic functions:
>
> class StringLiteralConverter : ITerminalConverter
> {
> void process(dchar c) {...}
>
> // Or maybe this to make it possible to minimize allocations
> // in certain circumstances by utilizing slices:
> void process(dchar c, size_t indexIntoSource, string fullOrignalSource)
> {...}
>
> Variant emit() {...}
> void clear() {...}
> }
>
> Each state in the lexer's DFA would know which terminals it could possibly
> be processing. And for each of those terminals that has an associated
> converter, the lexer will call 'process()'. If a terminal is accepted,
> 'emit' is called to get the final result (and maybe do any needed
> finalization first), and then 'clear' is called on all converters that had
> been used.
>
> This feature would preclude the use of the actual "GOLD Parser Builder"
> program, but since I'm writing a tool to handle that functionality anyway,
> I'm not too concerned about that.
>
> Do you think that would work? Would its benefits be killed by the overhead
> introduced? If so, could those overheads be sufficiently reduced without
> scrapping the general idea?
I don't know. I'd have to study the issue for a while. I suggest taking a look
at dmd's lexer and compare. I'm not sure what Spirit's approach to this is.
>> What Goldie will be compared against is Spirit. Spirit is a reasonably
>> successful add-on to C++. Goldie doesn't have to do things the same way as
>> Spirit (expression templates - ugh), but it should be as easy to use and
>> at least as powerful.
>>
>
> Understood.
More information about the Digitalmars-d
mailing list