std.d.lexer requirements

Christophe Travert travert at phare.normalesup.org
Sat Aug 4 03:02:11 PDT 2012


Jonathan M Davis , dans le message (digitalmars.D:174191), a écrit :
> On Thursday, August 02, 2012 11:08:23 Walter Bright wrote:
>> The tokens are not kept, correct. But the identifier strings, and the string
>> literals, are kept, and if they are slices into the input buffer, then
>> everything I said applies.
> 
> String literals often _can't_ be slices unless you leave them in their 
> original state rather than giving the version that they translate to (e.g. 
> leaving \© in the string rather than replacing it with its actual, 
> unicode value). And since you're not going to be able to create the literal 
> using whatever type the range is unless it's a string of some variety, that 
> means that the literals often can't be slices, which - depending on the 
> implementation - would make it so that that they can't _ever_ be slices.
> 
> Identifiers are a different story, since they don't have to be translated at 
> all, but regardless of whether keeping a slice would be better than creating a 
> new string, the identifier table will be far superior, since then you only need 
> one copy of each identifier. So, it ultimately doesn't make sense to use slices 
> in either case even without considering issues like them being spread across 
> memory.
> 
> The only place that I'd expect a slice in a token is in the string which 
> represents the text which was lexed, and that won't normally be kept around.
> 
> - Jonathan M Davis

I thought it was not the lexer's job to process litterals. Just split 
the input in tokens, and provide minimal info: TokenType, line and col 
along with the representation from the input. That's enough for a syntax 
highlighting tools for example. Otherwise you'll end up doing complex 
interpretation and the lexer will not be that efficient. Litteral 
interpretation can be done in a second step. Do you think doing litteral 
interpretation separately when you need it would be less efficient?

-- 
Christophe


More information about the Digitalmars-d mailing list