Let's stop parser Hell
Christophe Travert
travert at phare.normalesup.org
Wed Aug 1 00:12:30 PDT 2012
"Jonathan M Davis" , dans le message (digitalmars.D:173860), a écrit :
> struct Token
> {
> TokenType type;
> string str;
> LiteralValue value;
> SourcePos pos;
> }
>
> struct SourcePos
> {
> size_t line;
> size_t col;
> size_t tabWidth = 8;
> }
The occurence of tabWidth surprises me.
What is col supposed to be ? an index (code unit), a character number
(code point), an estimation of where the caracter is supposed to be
printed on the line, given the provided tabwidth ?
I don't think the lexer can realy try to calculate at what column the
character is printed, since it depends on the editor (if you want to use
the lexer to syntax highlight for example), how it supports combining
characters, zero or multiple column characters, etc. (which you may not
want to have to decode).
You may want to provide the number of tabs met so far. Note that there
are other whitespace that you may want to count, but you shouldn't have
a very complicated SourcePos structure. It might be easier to have
whitespace, endofline and endoffile tokens, and let the user filter out
or take into account what he wants to take into account. Or just let the
user look into the original string...
More information about the Digitalmars-d
mailing list