std.d.lexer requirements

Jonathan M Davis jmdavisProg at gmx.com
Fri Aug 3 19:31:22 PDT 2012


On Thursday, August 02, 2012 11:08:23 Walter Bright wrote:
> The tokens are not kept, correct. But the identifier strings, and the string
> literals, are kept, and if they are slices into the input buffer, then
> everything I said applies.

String literals often _can't_ be slices unless you leave them in their 
original state rather than giving the version that they translate to (e.g. 
leaving \© in the string rather than replacing it with its actual, 
unicode value). And since you're not going to be able to create the literal 
using whatever type the range is unless it's a string of some variety, that 
means that the literals often can't be slices, which - depending on the 
implementation - would make it so that that they can't _ever_ be slices.

Identifiers are a different story, since they don't have to be translated at 
all, but regardless of whether keeping a slice would be better than creating a 
new string, the identifier table will be far superior, since then you only need 
one copy of each identifier. So, it ultimately doesn't make sense to use slices 
in either case even without considering issues like them being spread across 
memory.

The only place that I'd expect a slice in a token is in the string which 
represents the text which was lexed, and that won't normally be kept around.

- Jonathan M Davis


More information about the Digitalmars-d mailing list