2.011 invariant question

Wed Feb 27 01:14:17 PST 2008

Frits van Bommel wrote:
> 
> A hashtable could get you an average O(1) for that lookup (the worst 
> case could still be O(N) or O(log N) though, depending on the 
> implementation). You wouldn't need to enter every allocation in there, 
> by the way. If GC pools are always allocated on some aligned address, 
> the start of one can be found by bitwise-anding out the lower bits of 
> the pointer, and that address could be checked in the hashtable. "is 
> this the start of an allocation" then becomes a hashtable lookup plus a 
> check "(cast(size_t)(p - pool_start) % pool_objsize == 0"[1]. (The 0 may 
> need to be adjusted to allocation_header.sizeof)
> 
> [1]: For pool_objsize == 2^N for some N no actual division needs to be 
> performed; "& ((1 << N) - 1)" has the same effect.
> 
> 
> On another note, how about this for a rule on concatenating to invariant 
> arrays:
> For each heap-allocated array, put both the reserved size and *the 
> maximum used size* in the header of the allocation. Whenever arr.ptr + 
> arr.length is equal to allocation.start + allocation.max_size_ever, it's 
> safe to append to it in-place as long as you increase the maximum size 
> accordingly.
> (This'll need some synchronization to be thread-safe, but when there are 
> multiple threads allocations are currently already synchronized anyway)
> 
> Note that the rule above allows appending to trailing slices; this is 
> unsafe for mutable arrays since they traditionally reallocated in this 
> instance and programs may rely on them not sharing data with the 
> original. However, this is safe for invariant arrays since the shared 
> prefix can't change -- the only way to detect sharing would be pointer 
> comparisons which are not required to provide meaningful information 
> anyway (IIRC the spec allows "moving" garbage collectors, so pointer 
> comparisons aren't reliable anyway since you can't count on your data 
> staying where it is).
> 
> This rule would allow incremental string building to be fairly efficient 
> even for invariant arrays; at worst, the initial append will trigger a 
> reallocation before an allocated block of memory runs out. The rest will 
> only reallocate when it's actually needed just like with mutable-string 
> building. (At least, as long as you don't slice away elements from the end)

This approach would probably be the way to go - don't know how bad the
implementation overhead would be, but probably not too high. In any case,
the idea sounds very appealing and provides a very unintrusive behaviour.

The block look-up currently seems to cache the last accessed block, so
for successively appending to a single array it's always O(1) - only when
building two or more arrays are built in parallel, a real look-up has to be
made. This is probably sufficient for most use cases and I think the
maintenance costs for a hashtable would probably be higher than the benefit.

So a dedicated container for array building can of course never hurt, but
this scheme does at least not make it any more necessary than it is now.