Handling of U+2028 and U+2029 in source code
deadalnix
deadalnix at gmail.com
Tue Oct 17 23:18:29 UTC 2023
On Tuesday, 17 October 2023 at 00:37:41 UTC, Richard (Rikki)
Andrew Cattermole wrote:
> https://github.com/dlang/dmd/blob/master/compiler/src/dmd/lexer.d#L578
>
> Basically its in multi-byte UTF-8 character, checks if its in
> the non-ASCII character ranges. No special handling of new
> lines is provided, but probably should be.
I've noticed that in the past, but this is clearly wrong. It's
not just whitespace, it's also punctuation, emoji, a ton of stuff
that are just not identifiers.
The lexer should match the proper charset as a character start.
More information about the Digitalmars-d
mailing list