2.011 invariant question

Sat Feb 23 08:04:20 PST 2008

Janice Caron wrote:
> On 23/02/2008, Sönke Ludwig <ludwig at informatik_dot_uni-luebeck.de> wrote:
>> I think that if it's fixed, it has to be done transparently or it my be
>>  difficult to explain why s = s ~ "x"; is allowed but s ~= "x"; is not.
> 
> You know, the more I think about it, the more I come to the conclusion
> that you're right, and that
> 
>     x ~= y;
> 
> should behave /exactly/ like
> 
>     x = x ~ y;
> 
> and cause a reallocation every time. Incremental string building is
> the usual reason given why this isn't the case, but I think there's
> another way round that problem.
> 
> Think about what happens at runtime when x ~= y; is executed. First,
> the compiler has to decide whether or not x is resizable (that is,
> does it point to the start of a resizable block?). Well - how does it
> know that? Answer: it has to ask the garbage collector. OK, so how
> does the garbage collector know that? Well, presumably, the gc has to
> have some sort of map or table, so it can look up the address x, and
> if it's in the table, then the answer is yes. But that lookup
> operation has to be slower that O(1). No matter how you look at it,
> there has to be at least O(log N) CPU time being consumed there, where
> N is the number of gc allocated blocks.
> 
> It seems to me that a better approach would be to ditch that strategy
> altogether, let ~= have predictable behavior even when the array is
> non-unique, and use a /different/ strategy for incremental array
> building. I would suggest something as simple as:
> 
>     struct Buffer!(T)
>     {
>         private T[] array;
>         private uint reserved;
>         void opCatAssign() {...}
>         T[] toArray() {...}
>         const(T)[] toCArray() {...}
>         invariant(T)[] toIArray() {...}
>         /*etc*/
>     }
> 
> would do the job. Then instead of doing
> 
>     void buildString()
>     {
>         string r;
>         r ~= something;
>         r ~= somethingElse;
>         return r;
>     }
> 
> you would do
> 
>     void buildString()
>     {
>         Buffer!(char) r;
>         r ~= something;
>         r ~= somethingElse;
>         return r.toIArray;
>     }
> 
> toIArray() would have to call either assumeUnique or idup under the
> hood. We could argue about which was the most sensible choice. But
> either way, you saved a lot of CPU time because you no longer have to
> decide at runtime whether or not this is a resizable buffer - that's
> now something you know at compile time. And even when you do know
> that, finding the reserved size is O(1), not O(log N). It's gotta be a
> winner all round.
> 

Sounds like a good approach. The rare cases where the in-place resizing
behavior of the build-in arrays is needed for performance could very
well be handled with a standard Buffer/vector/whatever. To me, it feels a bit
strange to use such an optimization without an explicit reserve()/capacity()
interface anyway... and the lookup performance argument is sound, too.