std.d.lexer: pre-voting review / discussion
Jos van Uden
usenet at fwend.com
Thu Sep 26 09:47:06 PDT 2013
On 26-9-2013 17:41, Dominikus Dittes Scherkl wrote:
> Hello.
>
> I'm not sure if this belongs here, but I think there is bug at the very start of the Lexer chapter:
>
> Is U+001A really meant to end the source file?
> According to the Unicode specification this is a "replacement character", like the newer U+FFFC. Or is it simply a spelling error and U+0019 was intended to
> end the source (this would fit, as it means "end of media").
>
> I don't know if anybody ever has ended his source in that way or if it was tested.
>
> More important to me is, that all the Space-Characters beyond ASCII are not
> considered whitespace (starting with U+00A0 NBSP, the different wide spaces
> U+2000 to U+200B up to the exotic stuff U+202F, U+205F, U+2060, U+3000 and
> the famous U+FEFF). Why?
> Ok, the set is much larger, but for the end-of-line also the unicode versions (U+2028 and U+2029) are added. This seems inconsequent to me.
I imagine the lexer follows the language specification:
http://dlang.org/lex.html#EndOfFile
More information about the Digitalmars-d
mailing list