Handling of U+2028 and U+2029 in source code

deadalnix deadalnix at gmail.com
Tue Oct 17 23:18:29 UTC 2023


On Tuesday, 17 October 2023 at 00:37:41 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
> https://github.com/dlang/dmd/blob/master/compiler/src/dmd/lexer.d#L578
>
> Basically its in multi-byte UTF-8 character, checks if its in 
> the non-ASCII character ranges. No special handling of new 
> lines is provided, but probably should be.

I've noticed that in the past, but this is clearly wrong. It's 
not just whitespace, it's also punctuation, emoji, a ton of stuff 
that are just not identifiers.

The lexer should match the proper charset as a character start.


More information about the Digitalmars-d mailing list