BitArray implementation issue

H. S. Teoh via Digitalmars-d digitalmars-d at puremagic.com
Tue Jul 22 17:57:48 PDT 2014


On Tue, Jul 22, 2014 at 09:16:35PM +0000, safety0ff via Digitalmars-d wrote:
> Currently the implementation of BitArray's length setter is badly broken.
> Its implementation contains this code segment:
> 
>     if (newdim != olddim) {
>         auto b = ptr[0 .. olddim];
>         b.length = newdim;    // realloc
>         ptr = b.ptr;
>         if (newdim & (bitsPerSizeT-1)) // Set any pad bits to 0
>             ptr[newdim - 1] &= ~(~0 << (newdim & (bitsPerSizeT-1)));
>     }
> 
> There are many bugs in the last 2 lines:
> 1) If we're increasing the length, ptr[newdim-1] is necessarily zero,
> so we're doing nothing.
> 2) If we're reducing the length, we're essentially clearing arbitrary
> bits whenever the branch is taken, furthermore:
> 3) On 64 bit, we're always clearing the upper 32 bits of the last
> word.
> 4) On 64 bit, we have undefined behaviour from the left shift.

Wow. Looks like this is in dire need of some TLC.


> While trying to fix this the question was raised: should we even clear
> any bits?

You might want to consider implementing a way of tracking how many bits
in the final word are valid. That way, you can correctly trigger
reallocation if the user tries to write to a bit beyond the current end
of the array (instead of stomping over other BitArray's data), and you
don't have to clear any bits except newly-allocated words. Assuming we
want to keep dynamic-array-like behaviour for BitArray, that is. I think
it's a good idea.


> The underlying array can be shared between BitArray's (assignment
> behaves like dynamic array assignment.)
> This means that clearing the bits might affect other BitArray's if the
> extension does not cross a size_t boundary.
> So it is difficult to mimic D dynamic array semantics at a bit level
> without a reference counting mechanism.

I think it's best to keep a bit count of the number of valid bits at the
end of the array, in addition to the number of words in the underlying
array. You might also want to consider keeping a bit count of the number
of valid bits at the *start* of the array, so that it's possible to
bit-level slicing in a transparent way.


> Furthermore, a BitArray can function as an arbitrary data
> reinterpreter via the function:
> void init(void[] v, size_t numbits)
> 
> Help is needed for deciding whether setting length should zero the new bits
> (even if it means stomping other BitArrays.)

It shouldn't. Keeping track of the number of valid bits in the last word
should solve this problem.


> Currently I've implemented zero'ing bits upon extension in my PR [1],
> but it is stalled on this design issue.
> 
> [1] https://github.com/D-Programming-Language/phobos/pull/2249

I think the best solution is to keep an additional count of the number
of valid bits in the last word of the array. Then you don't have to
worry about stomping other BitArrays, and you can potentially implement
bit-level slicing too.


T

-- 
IBM = I Blame Microsoft


More information about the Digitalmars-d mailing list