std.d.lexer: pre-voting review / discussion

Jos van Uden usenet at fwend.com
Thu Sep 26 09:47:06 PDT 2013


On 26-9-2013 17:41, Dominikus Dittes Scherkl wrote:
> Hello.
>
> I'm not sure if this belongs here, but I think there is bug at the very start of the Lexer chapter:
>
> Is U+001A really meant to end the source file?
> According to the Unicode specification this is a "replacement character", like the newer U+FFFC. Or is it simply a spelling error and U+0019 was intended to
> end the source (this would fit, as it means "end of media").
>
> I don't know if anybody ever has ended his source in that way or if it was tested.
>
> More important to me is, that all the Space-Characters beyond ASCII are not
> considered whitespace (starting with U+00A0 NBSP, the different wide spaces
> U+2000 to U+200B up to the exotic stuff U+202F, U+205F, U+2060, U+3000 and
> the famous U+FEFF). Why?
> Ok, the set is much larger, but for the end-of-line also the unicode versions (U+2028 and U+2029) are added. This seems inconsequent to me.

I imagine the lexer follows the language specification:

http://dlang.org/lex.html#EndOfFile


More information about the Digitalmars-d mailing list