BitArray/BitFields - Review

Dmitry Olshansky dmitry.olsh at gmail.com
Sat Jul 28 23:27:31 PDT 2012


On 29-Jul-12 03:16, Era Scarecrow wrote:
> On Saturday, 28 July 2012 at 23:02:17 UTC, Dmitry Olshansky wrote:
>> On 29-Jul-12 02:31, Era Scarecrow wrote:
>>> On Saturday, 28 July 2012 at 22:19:05 UTC, Dmitry Olshansky wrote:
>>>> Mhm? Not getting you it at all. What's locking it ?
>>>>
>>>> What's wrong with:
>>>> struct BitArray
>>>> {
>>>>
>>>> union {
>>>>    struct {
>>>>        size_t[] data;
>>>>        ulong length;    //in bits
>>>>
>>>> //length in words would be just
>>>> // (length + size_t.sizeof*4) / // (size_t.sizeof*8);
>>>> // as it's a power of 2 it should get optimized
>>>>
>>>>    }
>>>>    size_t[2+ulong.sizeof/size_t.sizeof] compact;
>>>>
>>>> // and say 1 bit is used for isCompact would be great if it could use
>>>> // the least significant bit of pointer from data[], dunno if it's
>>>> easy enough
>>>
>>> I'd rather not rely on the pointer at all as a reference for if it's
>>> compact.
>>
>> No problem, I redesigned a bit:
>>
>> struct BitArray
>> {
>>
>> union {
>>     struct {
>>         ulong length;    //in bits, the top bit used as isCompact
>>         size_t[] data;
>>     }
>>     struct{
>>     ushort smallLength; //here as well
>>     ubyte[size_t.sizeof*2+ulong.sizeof-2] compact;
>>     }
>> }
>>
>> Up 2^63 of bits would occupy 2^60 of bytes should be good enough for
>> the years to come.
>>  I'm not seeing it as a problem as at very least processors have to
>> actually be able to address more then 42 (or was it 48?) bits that
>> they can now. (see AMD or Intel or ARM datasheets on 64 bit chips)
>
> 2^63, or drop 8 so it takes up no more than 2^55 bytes of memory. Yes
> that's quite a large chunk.
>
>   Curiously enough the current structure is already smaller than the
> proposed unions.

OK. Now back to the biggest question:
Slices use references (sometimes) to previous bitarray. Since it's a 
slice (like an array) this could be expected.

The "sometimes" part is not god enough! The problem is that small string 
optimization (or small array optimization) doesn't work with reference 
semantics. You may dig up discussion on proposal for small string 
optimization for char[] and string that was shot down on these grounds.
Bulit-in like semantics call for simple ptr/ptr+length struct and you 
can't get that can you?.

E.g.

void mess_with(BitArray ba)
{
ba ~= true;
ba ~= false;
  //does it change anything outside this function? *maybe*
ba[0] ^= true;
ba[1] ^= true;
auto x = ba; //even if used ref BitArray ba, x *maybe* references ba
//is x now a reference or a copy? *maybe*

}

Thus BitArray either becomes value type (that may come as unexpected, 
see your option c/d). Or it becomes full reference type as in option a 
(though it sucks).
Sorry but b is no go, as it makes code unpredictable I'd rather take my 
idea with separate ranges then b.

Solutions:
  a) Drop the union, make all functions @safe (compared to @trusted), 
and they are all ref/slices
  b) Leave as sometimes ref, sometimes not; Use .dup when you NEED to 
ensure different unique copies.
  c) Apply COW (Copy-On-Write) where a known slice type (canExpand is 
false) when a change would apply, it replaces the present slice into 
it's own unique array.
  d) All slices become new dup'ed allocated arrays (Most expensive of 
them all)

Because of d, I proposed that slices be separate type that is always a 
reference to original. Then only coping arrays themselves involves dup.

>
>> Begin takes extra ulong I'm not sure it is needed in the container
>> itself.
>
>   No it doesn't. bitfields are used to handle the beginning/ending
> offsets, equaling a total of 14 bits. It's the most confusing part
> (aside from [offset + sliceoffset + i[ formulas). canExpand and
> isCompact fill the other two bits.

OK, finally I think I understand how your current version works. It's 
clever but leaves you in half-way value-reference state.

To point out few problems with unification of slice type and BitArray:

- Every time a small slice is taken say [1..30] it becomes a small 
array(?) thus it is a value type and doesn't affect original array

- No idea if a = b make a copy will make programmers nervous, esp if one 
is writing a library function as in this case there is no assumption 
that fits all

- Same thing but more, does this work?
	auto c = ba; //if ba is compact
	ba ~= true;
	assert(c.length = ba.length-1);
	//c should remain with the same length unaffected (at least like built 
in arrays)

-- 
Dmitry Olshansky


More information about the Digitalmars-d-learn mailing list