Ternary Search Trees

Fri Apr 17 02:40:27 PDT 2009

Fawzi:
>so TST in the structure refers by default to the struct itself.<

Thank you, now I understand. I am slowly learning more.

-------------------------

Jason House:

> Why use a struct for TST?<

To give you a good answer I have done some tests using a file "english_words.txt", of about 2.1 MB that contains about 220_000 different words.

The following tiny program loads that file and splits it.
It requires 6.2 MB of RAM and about 0.09 seconds to run (once the file cache is warm. This thanks to the last patch to the GC that speeds this up a lot):

import std.stdio: writefln;
import std.string: split;
import std.file: read;
void main() {
    auto words = split(cast(string)read("english_words.txt"));
}

--------

The following program tests the TST (positive lookups only, no negative lookups yet), this is for struct-based TST!(char). For the TST-class-based version , you can just replace *root with root:

void main() {
    auto words = split(cast(string)read("english_words.txt"));
    auto root = new TST!(char);
    root.bulk_add(words);
    for (int i; i < 100; i++)
        foreach (word; words)
            if (!(word in *root))
                throw new Exception("\"" ~ word ~ "\" not found");
}

The struct-based needs 3.95 seconds and 24.35 MB RAM to run.
The class-based (final methods) needs 4.03 seconds and 24.56 MB RAM (DMD v1.042 on WinXP, 2 GHz CPU).

Both versions allocate 461_495 TST nodes. [Each TST struct node needs 16 bytes. Each TST object probably 24 bytes that I think the GC allocates as 32. But the actual memory used denies this twofold difference, I don't know why.]

I guess the speed difference comes mostly from the higher memory overhead of classes (because the more memory there is, the more the CPU caches have to work).

--------

With a tiny change, adding "scope", so just the root of the tree gets allocated on the stack seems to reduce the running time of the class-based version from 4.03 to 3.99 seconds.

I don't know why moving the root to the stack gives so much difference. It's not a small difference, because on a 2 GHz CPU ~0.04 seconds mean something like 80 million clock ticks. The root reference is traversed each time, so the removal of just this level of indirection may explain the time difference. A mammalian brain like mine isn't well equipped to understand something that performs billions of operations every second. I have taken a look at the ASM generated by the class-based with and without scope, but I have not understood the asm well enough:

void main() {
    auto words = split(cast(string)read("english_words.txt"));
    scope auto root = new TST!(char); // scope ****
    root.bulk_add(words);
    for (int i; i < 100; i++)
        foreach (word; words)
            if (!(word in root))
                throw new Exception("\"" ~ word ~ "\" not found");
}

--------

This, with a struct-based TST, needs 3.96 seconds (0.48 s just building the tree):

void main() {
    auto words = split(cast(string)read("english_words.txt"));
    TST!(char) root;
    root.bulk_add(words);
    for (int i; i < 100; i++)
        foreach (word; words)
            if (!(word in root))
                throw new Exception("\"" ~ word ~ "\" not found");
}

--------

The following with a struct-based TST, needs 3.71 s, 13.68 MB (0.32 s just building, 0.31 s with no GC) (0.26 s just building, allocating memory with calloc, total 3.52 s, 13.37 MB):

TST[] nodes;
int nnode;

void main() {
    nodes = new TST!(char)[461_495];
    auto words = split(cast(string)read("english_words.txt"));
    auto root = nodes[nnode++];
    root.bulk_add(words);
    for (int i; i < 100; i++)
        foreach (word; words)
            if (!(word in root))
                throw new Exception("\"" ~ word ~ "\" not found");
}

Idem, but struct fields aligned to 1 byte: 3.89 s, 12.72 MB.  (0.48 s just building), (12.46 MB allocating memory with calloc)

--------

Without align, using std.gc.malloc + hasNoPointers + std.c.memset: 3.53 s total.

void main() {
    disable();
    TST.nodes = cast(TST*)gcmalloc(461_495 * TST.sizeof);
    hasNoPointers(TST.nodes);
    memset(TST.nodes, 0, 461_495 * TST.sizeof);
    auto words = split(cast(string)read("english_words.txt"));
    auto root = TST.nodes[TST.nnode++];
    root.bulk_add(words);
    for (int i; i < 100; i++)
        foreach (word; words)
            if (!(word in root))
                throw new Exception("\"" ~ word ~ "\" not found");
}

Of course you can add static methods to the struct to create the array, create a new node moving an index forward, and freeing the whole array, plus static fields of the array and #nodes used so far.

--------

Once tree nodes are items of an array, you can use indexes to address node children, so if memory gets very tight you can even use 24-bit integers as indexes:
http://www.fantascienza.net/leonardo/so/dlibs/uint24.html

They are very slow, so in most situations you want to use ints:

struct TST(T, TyIndex=int) {...

Then this uses less memory, as low as 11 bytes/node, but TST must also be aligned to 1 byte:

TST!(char, Uint24) root; // 

--------

The last examples work only if you know the number of nodes at compile time. 
If you need a more dynamic data structure, the code gets more tricky/hairy.

Probably the best thing you can do is to keep an array of blocks of nodes, each block not too much big (es 64 KB), and allocate such blocks on-demand. Something like (code untested, it surely contains bugs. This comes after several rounds of code simplification):

struct TST(T) {
    // *** TST instance attributes here ***

    struct Block {
        // Block is not too much big nor too much small
        const int SIZE = ((1 << 16) / TST.sizeof) > 1 ? ((1 << 16) / TST.sizeof) : 1;
        TST[TST.Block.SIZE] nodes;
    }

    // usedLastBlock keeps the number of nodes used from the last block of blocks
    // there's no need to keep the whole number of blocks, because its usage is
    // slower and all the blocks but the last one are full anyway.
    static int usedLastBlock;

    static Block*[] blocks;

    static TST* newNode() {
        if (!TST.blocks.length || TST.usedLastBlock >= TST.Block.SIZE) {
            // add new block
            TST.blocks.length = blocks.length + 1;
            // this whole new block gets scanned by the GC even if you need only 1 node
            TST.blocks[$-1] = new Block;
            TST.usedLastBlock = 0;
        }

        return &(TST.blocks[$-1].nodes[TST.usedLastBlock++]);
    }

    /// deallocate blocks of nodes
    static void free() {
        foreach (block_ptr; TST.blocks)
            delete block_ptr;
        TST.blocks[] = null;
        TST.blocks.length = 0;
    }

    // *** all TST methods here ***
}

This last code doesn't work yet, I have so far failed to debug it, the compiler returns:
struct test.TST!(char).TST no size yet for forward reference
I'll try to fix it when I have the time. If you know how to fix it, or what the problem is, please tell me.

Bye,
bearophile