std.d.lexer requirements

Sat Aug 4 04:32:22 PDT 2012

On 04-Aug-12 14:55, Christophe Travert wrote:
> Dmitry Olshansky , dans le message (digitalmars.D:174214), a écrit :
>> Most likely - since you re-read the same memory twice to do it.
>
> You're probably right, but if you do this right after the token is
> generated, the memory should still be near the processor. And the
> operation on the first read should be very basic: just check nothing
> illegal appears, and check for the end of the token.

q{ .. }
"\x13\x27 ...\u1212"

In most cases it takes around the same time to check correctness and 
output it as simply pass it by. (see also re-decoding unicode in 
identifiers, though that's rare to see unicode chars in identifier)

> The cost is not
> negligible, but what you do with litteral tokens can vary much, and what
> the lexer will propose may not be what the user want, so the user may
> suffer the cost of the litteral decoding (including allocation of the
> decoded string, the copy of the caracters, etc), that he doesn't want,
> or will have to re-do his own way...
>
I see it as a compile-time policy, that will fit nicely and solve both 
issues. Just provide a templates with a few hooks, and add a Noop policy 
that does nothing.

-- 
Dmitry Olshansky