Newline character set in the D lexer - NEL

Dominikus Dittes Scherkl dominikus at scherkl.de
Mon Aug 31 09:14:07 UTC 2020


On Monday, 31 August 2020 at 01:49:06 UTC, Cecil Ward wrote:
> Would there be any benefit from the following suggestion? Add 
> the character Unicode NEL U+0085 into the set of EndOfLine 
> characters in the lexer ?
>
> Cecil Ward.

I personally think we should have these definitions:

              /*  NUL    EM    SUB */
EndOfFile   = { 0x00 | 0x19 | 0x1A | PhysicalEndOfFile };
              /*  LF     FF     CR      CR LF     NEL     LSEP     
PSEP  */
EndOfLine   = { 0x0A | 0x0C | 0x0D | 0x0D 0x0A | 0x85 | 0x2028 | 
0x2029 | EndOfFile };

              /*  HT     VT     SP    NBSP    NQSP     MQSP     
ENSP     EMSP     3/MSP */
WhiteSpace  = { 0x09 | 0x0B | 0x20 | 0xA0 | 0x2000 | 0x2001 | 
0x2002 | 0x2003 | 0x2004

              /*  4/MSP    6/MSP     FSP      PSP     THSP      
HSP     ZWSP     NNBSP */
               | 0x2005 | 0x2006 | 0x2007 | 0x2008 | 0x2009 | 
0x200A | 0x200B | 0x202F

              /*  MMSP      WJ      IDSP    ZWNBSP */
               | 0x205F | 0x2060 | 0x3000 | 0xFEFF | EndOfLine };

The definition of D source files misses quite a lot of them :-(

EM = end of medium (what if not this should end a file?!?)
NEL = New Line
LSEP = Line Separator
PSEP = Paragraph Separator

NBSP = non-braking space
NQSP = ENSP = N-wide space
MQSP = EMSP = M-wide space
3/MSP = 1/3 M-wide space (three spaces together are as wide as an 
M)
4/MSP = 1/4 M-wide space
6/MSP = 1/6 M-wide space
FSP = figure space
PSP = point space
THSP = thin space
HSP = hair space
ZWSP = zero width space
NNBSP = narrow non-braking space
MMSP = mathematic space
WJ = word joiner (invisible space that separate words for the 
spelling correction)
IDSP = ideographic space (same width as a chinese character)
ZWNBSP = zero-width non-braking space


More information about the Digitalmars-d mailing list