2.011 invariant question

Sat Feb 23 05:37:10 PST 2008

On 23/02/2008, Sönke Ludwig <ludwig at informatik_dot_uni-luebeck.de> wrote:
> I think that if it's fixed, it has to be done transparently or it my be
>  difficult to explain why s = s ~ "x"; is allowed but s ~= "x"; is not.

You know, the more I think about it, the more I come to the conclusion
that you're right, and that

    x ~= y;

should behave /exactly/ like

    x = x ~ y;

and cause a reallocation every time. Incremental string building is
the usual reason given why this isn't the case, but I think there's
another way round that problem.

Think about what happens at runtime when x ~= y; is executed. First,
the compiler has to decide whether or not x is resizable (that is,
does it point to the start of a resizable block?). Well - how does it
know that? Answer: it has to ask the garbage collector. OK, so how
does the garbage collector know that? Well, presumably, the gc has to
have some sort of map or table, so it can look up the address x, and
if it's in the table, then the answer is yes. But that lookup
operation has to be slower that O(1). No matter how you look at it,
there has to be at least O(log N) CPU time being consumed there, where
N is the number of gc allocated blocks.

It seems to me that a better approach would be to ditch that strategy
altogether, let ~= have predictable behavior even when the array is
non-unique, and use a /different/ strategy for incremental array
building. I would suggest something as simple as:

    struct Buffer!(T)
    {
        private T[] array;
        private uint reserved;
        void opCatAssign() {...}
        T[] toArray() {...}
        const(T)[] toCArray() {...}
        invariant(T)[] toIArray() {...}
        /*etc*/
    }

would do the job. Then instead of doing

    void buildString()
    {
        string r;
        r ~= something;
        r ~= somethingElse;
        return r;
    }

you would do

    void buildString()
    {
        Buffer!(char) r;
        r ~= something;
        r ~= somethingElse;
        return r.toIArray;
    }

toIArray() would have to call either assumeUnique or idup under the
hood. We could argue about which was the most sensible choice. But
either way, you saved a lot of CPU time because you no longer have to
decide at runtime whether or not this is a resizable buffer - that's
now something you know at compile time. And even when you do know
that, finding the reserved size is O(1), not O(log N). It's gotta be a
winner all round.