2.011 invariant question
Frits van Bommel
fvbommel at REMwOVExCAPSs.nl
Sun Feb 24 16:02:39 PST 2008
Janice Caron wrote:
> On 23/02/2008, Sönke Ludwig <ludwig at informatik_dot_uni-luebeck.de> wrote:
>> I think that if it's fixed, it has to be done transparently or it my be
>> difficult to explain why s = s ~ "x"; is allowed but s ~= "x"; is not.
>
> You know, the more I think about it, the more I come to the conclusion
> that you're right, and that
>
> x ~= y;
>
> should behave /exactly/ like
>
> x = x ~ y;
>
> and cause a reallocation every time. Incremental string building is
> the usual reason given why this isn't the case, but I think there's
> another way round that problem.
>
> Think about what happens at runtime when x ~= y; is executed. First,
> the compiler has to decide whether or not x is resizable (that is,
> does it point to the start of a resizable block?). Well - how does it
> know that? Answer: it has to ask the garbage collector. OK, so how
> does the garbage collector know that? Well, presumably, the gc has to
> have some sort of map or table, so it can look up the address x, and
> if it's in the table, then the answer is yes. But that lookup
> operation has to be slower that O(1). No matter how you look at it,
> there has to be at least O(log N) CPU time being consumed there, where
> N is the number of gc allocated blocks.
A hashtable could get you an average O(1) for that lookup (the worst
case could still be O(N) or O(log N) though, depending on the
implementation). You wouldn't need to enter every allocation in there,
by the way. If GC pools are always allocated on some aligned address,
the start of one can be found by bitwise-anding out the lower bits of
the pointer, and that address could be checked in the hashtable. "is
this the start of an allocation" then becomes a hashtable lookup plus a
check "(cast(size_t)(p - pool_start) % pool_objsize == 0"[1]. (The 0 may
need to be adjusted to allocation_header.sizeof)
[1]: For pool_objsize == 2^N for some N no actual division needs to be
performed; "& ((1 << N) - 1)" has the same effect.
On another note, how about this for a rule on concatenating to invariant
arrays:
For each heap-allocated array, put both the reserved size and *the
maximum used size* in the header of the allocation. Whenever arr.ptr +
arr.length is equal to allocation.start + allocation.max_size_ever, it's
safe to append to it in-place as long as you increase the maximum size
accordingly.
(This'll need some synchronization to be thread-safe, but when there are
multiple threads allocations are currently already synchronized anyway)
Note that the rule above allows appending to trailing slices; this is
unsafe for mutable arrays since they traditionally reallocated in this
instance and programs may rely on them not sharing data with the
original. However, this is safe for invariant arrays since the shared
prefix can't change -- the only way to detect sharing would be pointer
comparisons which are not required to provide meaningful information
anyway (IIRC the spec allows "moving" garbage collectors, so pointer
comparisons aren't reliable anyway since you can't count on your data
staying where it is).
This rule would allow incremental string building to be fairly efficient
even for invariant arrays; at worst, the initial append will trigger a
reallocation before an allocated block of memory runs out. The rest will
only reallocate when it's actually needed just like with mutable-string
building. (At least, as long as you don't slice away elements from the end)
More information about the Digitalmars-d
mailing list