DCT: D compiler as a collection of libraries

Jacob Carlborg doob at me.com
Fri May 11 02:36:26 PDT 2012


On 2012-05-11 11:22, Roman D. Boiko wrote:

>> What about line and column information?
> Indices of the first code unit of each line are stored inside lexer and
> a function will compute Location (line number, column number, file
> specification) for any index. This way size of Token instance is reduced
> to the minimum. It is assumed that Location can be computed on demand,
> and is not needed frequently. So column is calculated by reverse walk
> till previous end of line, etc. Locations will possible to calculate
> both taking into account special token sequences (e.g., #line 3
> "ab/c.d"), or discarding them.

Aha, clever. As long as I can get out the information I'm happy :) How 
about adding properties for this in the token struct?

>>>> * Does it convert numerical literals and similar to their actual values
>>> It is planned to add a post-processor for that as part of parser,
>>> please see README.md for some more details.
>>
>> Isn't that a job for the lexer?
> That might be done in lexer for efficiency reasons (to avoid lexing
> token value again). But separating this into a dedicated post-processing
> phase leads to a much cleaner design (IMO), also suitable for uses when
> such values are not needed.

That might be the case. But I don't think it belongs in the parser.

> Also I don't think that performance would be
> improved given the ratio of number of literals to total number of tokens
> and the need to store additional information per token if it is done in
> lexer. I will elaborate on that later.

Ok, fair enough. Perhaps this could be a property in the Token struct as 
well. In that case I would suggest renaming "value" to 
lexeme/spelling/representation, or something like that, and then name 
the new property "value".

-- 
/Jacob Carlborg


More information about the Digitalmars-d-announce mailing list