Request for comments: std.d.lexer
Dmitry Olshansky
dmitry.olsh at gmail.com
Wed Jan 30 09:36:06 PST 2013
30-Jan-2013 21:21, FG пишет:
> On 2013-01-30 17:50, Dmitry Olshansky wrote:
>> Instead I suggest to try and allocate a big block of fixed size (say
>> about
>> 16-64K) upfront and copy identifiers one by one there. When it fills just
>> allocate another one and move on.
>
> Yeah, similar to what I suggested, except that probably it would be good
> to also have a look-up structure for identifiers, so that only unique
> strings go into those blocks.
>
> I wonder what would be a practical data structure for such look-ups:
> A trie is great except for the implementation hurdle to keep it also in
> one or a few memory blocks to prevent frequent allocations.
Mm trie shouldn't have that many allocations. Typically each node is
fixed-sized. Also I don't see keeping it in many memory blocks as a show
stopper. The trick is exactly the same as I mentioned above but blocks
got to be quite a bit large (say 8 times :) ).
Truth be told the trie should be optimal for this, but a lot of effort
is required to implement a fast trie compared to rolling out a simple
hash-table. That's something I hope to change once my Trie template is
polished enough for Phobos proposal.
> A simpler approach would be to make it an array of (hash, string_slice)
> pairs, sorted by hash (of the identifier) - lots of string scanning though.
>
Mm since you need to find what's unique as you lex the file I suspect
that sorting is out of the window. I'd be modest and for starters just
use a hash-table + tune the hash function a bit.
--
Dmitry Olshansky
More information about the Digitalmars-d
mailing list