Making generalized Trie type in D
Roman D. Boiko
rb at d-coding.com
Mon Jun 4 11:42:58 PDT 2012
On Monday, 4 June 2012 at 15:35:31 UTC, Dmitry Olshansky wrote:
> On 04.06.2012 18:19, Andrei Alexandrescu wrote:
>> On 6/4/12 4:46 AM, Dmitry Olshansky wrote:
>>> Cross-posting from my GSOC list as I think we had a lack of D
>>> rox posts
>>> lately :)
>> [snip]
>>
>> I think it would be great if you converted this post into an
>> article.
>>
>> Andrei
>>
>
> Sounds good, will do once I fix few issues that were mentioned
> (bit-packing, GC types etc.)
Would be interesting to see some more examples, along with
rationale/motivation for various aspects of your API, and
possible usage scenarios.
Tries are fundamental (both data structures and respective
algorithms) for lookup problems, in the same way as arrays are
fundamental for indexed access.
For example, they have several advantages over hash tables. Hash
calculation requires const * key.length operations, which is
equivalent to the number of comparisons needed for trie lookup.
But hash tables may be less space efficient, or lead to hash
collisions which increase lookup time. Also, it is possible to
create persistent (immutable) tries with efficient (log N)
inserting / deleting (this scenario is very important for my DCT
project). Immutable hash tables would require 0(N) copying for
each insert / delete.
It is difficult to create a good API for fundamental data
structures, because various use cases would motivate different
trade-offs. The same is true for implementation. This is why I
like your decision to introduce policies for configuration.
Rationale and use cases should help to analyze design of your API
and implementation, thus you will get better community feedback :)
Below are some notes related to my DCT use cases.
Your examples deal with lookup by the whole word (first/last
characters and length are needed). Are your API and
implementation adaptable for character-by-character trie lookup?
Will compile-time generation of lookup code based on tries be
supported? Example which is currently in DCT (first implemented
by Brian Schott in his Dscanner project) uses switch statements
(which means lookup linear in number of possible characters at
each position). A trivial improvement might be using if
statements and binary lookup. (E.g., if there are 26 possible
characters used at some position, use only 5 comparisons, not 26).
I wanted to analyse your regex implementation, but that's not an
easy task and requires a lot of effort... It looks like the most
promising alternative to binary trie lookup which I described in
previous paragraph. Similarities and differences with your regex
design might also help us understand tries better.
More information about the Digitalmars-d
mailing list