Making generalized Trie type in D

Mon Jun 4 11:42:58 PDT 2012

On Monday, 4 June 2012 at 15:35:31 UTC, Dmitry Olshansky wrote:
> On 04.06.2012 18:19, Andrei Alexandrescu wrote:
>> On 6/4/12 4:46 AM, Dmitry Olshansky wrote:
>>> Cross-posting from my GSOC list as I think we had a lack of D 
>>> rox posts
>>> lately :)
>> [snip]
>>
>> I think it would be great if you converted this post into an 
>> article.
>>
>> Andrei
>>
>
> Sounds good, will do once I fix few issues that were mentioned 
> (bit-packing, GC types etc.)

Would be interesting to see some more examples, along with 
rationale/motivation for various aspects of your API, and 
possible usage scenarios.

Tries are fundamental (both data structures and respective 
algorithms) for lookup problems, in the same way as arrays are 
fundamental for indexed access.

For example, they have several advantages over hash tables. Hash 
calculation requires const * key.length operations, which is 
equivalent to the number of comparisons needed for trie lookup. 
But hash tables may be less space efficient, or lead to hash 
collisions which increase lookup time. Also, it is possible to 
create persistent (immutable) tries with efficient (log N) 
inserting / deleting (this scenario is very important for my DCT 
project). Immutable hash tables would require 0(N) copying for 
each insert / delete.

It is difficult to create a good API for fundamental data 
structures, because various use cases would motivate different 
trade-offs. The same is true for implementation. This is why I 
like your decision to introduce policies for configuration. 
Rationale and use cases should help to analyze design of your API 
and implementation, thus you will get better community feedback :)

Below are some notes related to my DCT use cases.

Your examples deal with lookup by the whole word (first/last 
characters and length are needed). Are your API and 
implementation adaptable for character-by-character trie lookup?

Will compile-time generation of lookup code based on tries be 
supported? Example which is currently in DCT (first implemented 
by Brian Schott in his Dscanner project) uses switch statements 
(which means lookup linear in number of possible characters at 
each position). A trivial improvement might be using if 
statements and binary lookup. (E.g., if there are 26 possible 
characters used at some position, use only 5 comparisons, not 26).

I wanted to analyse your regex implementation, but that's not an 
easy task and requires a lot of effort... It looks like the most 
promising alternative to binary trie lookup which I described in 
previous paragraph. Similarities and differences with your regex 
design might also help us understand tries better.