Request for comments: std.d.lexer

Dmitry Olshansky dmitry.olsh at gmail.com
Wed Jan 30 09:36:06 PST 2013


30-Jan-2013 21:21, FG пишет:
> On 2013-01-30 17:50, Dmitry Olshansky wrote:
>> Instead I suggest to try and allocate a big block of fixed size (say
>> about
>> 16-64K) upfront and copy identifiers one by one there. When it fills just
>> allocate another one and move on.
>
> Yeah, similar to what I suggested, except that probably it would be good
> to also have a look-up structure for identifiers, so that only unique
> strings go into those blocks.
>
> I wonder what would be a practical data structure for such look-ups:
> A trie is great except for the implementation hurdle to keep it also in
> one or a few memory blocks to prevent frequent allocations.

Mm trie shouldn't have that many allocations. Typically each node is 
fixed-sized. Also I don't see keeping it in many memory blocks as a show 
stopper. The trick is exactly the same as I mentioned above but blocks 
got to be quite a bit large (say 8 times :) ).

Truth be told the trie should be optimal for this, but a lot of effort 
is required to implement a fast trie compared to rolling out a simple 
hash-table. That's something I hope to change once my Trie template is 
polished enough for Phobos proposal.

> A simpler approach would be to make it an array of (hash, string_slice)
> pairs, sorted by hash (of the identifier) - lots of string scanning though.
>

Mm since you need to find what's unique as you lex the file I suspect 
that sorting is out of the window. I'd be modest and for starters just 
use a hash-table + tune the hash function a bit.


-- 
Dmitry Olshansky


More information about the Digitalmars-d mailing list