DCT: D compiler as a collection of libraries

Fri May 11 20:32:20 PDT 2012

On 5/11/12 10:14 PM, Roman D. Boiko wrote:
> On Friday, 11 May 2012 at 15:05:19 UTC, deadalnix wrote:
>> Le 11/05/2012 16:02, Roman D. Boiko a écrit :
>>> Technically, I'm trading N*0(1) operations needed to track line and
>>> column while consuming each character to M*0(log(n)) operations when
>>> calculating them on demand. N = number of characters, n is number of
>>> lines and M is number of actual usages of Location. My assumption is
>>> that M << N (M is much smaller than N).
>>
>> N can easily be number of tokens.
> Yes, if you are looking for the token by its index, not for location.
> E.g., for autocompletion it is needed to find the last token before
> cursor location. But that is not related to location search.
>
> Also please note that I oversimplified formula for complexity. It also
> has other components. My reply was just an additional minor comment.
> Motivation is design, not performance.
>
> One additional reason for such separation: with my approach it is
> possible to calculate Location both taking into account infromation from
> special token sequences, or ignoring it. How would you do that for eager
> calculation? Calculate only one category? Or track and store both?
>
> Unnecessary complexity will eventually find a way to shoot your leg :)
> Real-world usage (at least, anticipated scenarios) should be the basis
> for designing. Sometimes this contradicts intuition and requires you to
> look at the problem from a different side.

As deadalnix says, I think you are over-complicating things.

I mean, to store the column and line information it's just:

if (isNewLine(c)) {
   line++;
   column = 0;
} else {
   column++;
}

(I think you need to add that to the SourceRange class. Then copy line 
and column to token on the Lexer#lex() method)

Do you really think it's that costly in terms of performance?

I think you are wasting much more memory and performance by storing all 
the tokens in the lexer.

Imagine I want to implement a simple syntax highlighter: just highlight 
keywords. How can I tell DCT to *not* store all tokens because I need 
each one in turn? And since I'll be highlighting in the editor I will 
need column and line information. That means I'll have to do that 
O(log(n)) operation for every token.

So you see, for the simplest use case of a lexer the performance of DCT 
is awful.

Now imagine I want to build an AST. Again, I consume the tokens one by 
one, probably peeking in some cases. If I want to store line and column 
information I just copy them to the AST. You say the tokens are 
discarded but their data is not, and that's why their data is usually 
copied.