Let's stop parser Hell

Christophe Travert travert at phare.normalesup.org
Wed Aug 1 00:12:30 PDT 2012


"Jonathan M Davis" , dans le message (digitalmars.D:173860), a écrit :
> struct Token
> {
>  TokenType type;
>  string str;
>  LiteralValue value;
>  SourcePos pos;
> }
> 
> struct SourcePos
> {
>  size_t line;
>  size_t col;
>  size_t tabWidth = 8;
> }

The occurence of tabWidth surprises me.
What is col supposed to be ? an index (code unit), a character number 
(code point), an estimation of where the caracter is supposed to be 
printed on the line, given the provided tabwidth ?

I don't think the lexer can realy try to calculate at what column the 
character is printed, since it depends on the editor (if you want to use 
the lexer to syntax highlight for example), how it supports combining 
characters, zero or multiple column characters, etc. (which you may not 
want to have to decode).

You may want to provide the number of tabs met so far. Note that there 
are other whitespace that you may want to count, but you shouldn't have 
a very complicated SourcePos structure. It might be easier to have 
whitespace, endofline and endoffile tokens, and let the user filter out 
or take into account what he wants to take into account. Or just let the 
user look into the original string...



More information about the Digitalmars-d mailing list