BitArray/BitFields - Review

Era Scarecrow rtcvb32 at yahoo.com
Sat Jul 28 16:16:37 PDT 2012


On Saturday, 28 July 2012 at 23:02:17 UTC, Dmitry Olshansky wrote:
> On 29-Jul-12 02:31, Era Scarecrow wrote:
>> On Saturday, 28 July 2012 at 22:19:05 UTC, Dmitry Olshansky 
>> wrote:
>>> Mhm? Not getting you it at all. What's locking it ?
>>>
>>> What's wrong with:
>>> struct BitArray
>>> {
>>>
>>> union {
>>>    struct {
>>>        size_t[] data;
>>>        ulong length;    //in bits
>>>
>>> //length in words would be just
>>> // (length + size_t.sizeof*4) / // (size_t.sizeof*8);
>>> // as it's a power of 2 it should get optimized
>>>
>>>    }
>>>    size_t[2+ulong.sizeof/size_t.sizeof] compact;
>>>
>>> // and say 1 bit is used for isCompact would be great if it 
>>> could use
>>> // the least significant bit of pointer from data[], dunno if 
>>> it's easy enough
>>
>> I'd rather not rely on the pointer at all as a reference for 
>> if it's compact.
>
> No problem, I redesigned a bit:
>
> struct BitArray
> {
>
> union {
>     struct {
>         ulong length;    //in bits, the top bit used as 
> isCompact
>         size_t[] data;
>     }
>     struct{
> 	ushort smallLength; //here as well
> 	ubyte[size_t.sizeof*2+ulong.sizeof-2] compact;
>     }
> }
>
> Up 2^63 of bits would occupy 2^60 of bytes should be good 
> enough for the years to come.
>  I'm not seeing it as a problem as at very least processors 
> have to actually be able to address more then 42 (or was it 
> 48?) bits that they can now. (see AMD or Intel or ARM 
> datasheets on 64 bit chips)

2^63, or drop 8 so it takes up no more than 2^55 bytes of memory. 
Yes that's quite a large chunk.

  Curiously enough the current structure is already smaller than 
the proposed unions.

>> If the array is changed later to a template that could accept 
>> bytes, then where does that leave you? When the slice doesn't 
>> need to know more of the beginning,

> ???
> Slice can start at some random offset

  Yes they can, but so can a normal array (as they are pretty much 
the same)

>>  it can re-slice it to a lesser more appropriate size.
>
> no idea what's you are talking about here.

  As for requested compact to hold the starting/ending offsets at 
1 byte each, obviously slicing at say 1024 to 2048 of a 10,000 
length bit array, that won't work (0-127). If it's larger than 
size_t in bits, it will subtract that size and internally change 
the pointer of the array it actually points to, until it can fit 
that new value.

start/end slice: [32 .. 64]
equals: size_t array[1 .. 2]

>> It may very well soon contain a third entry in the union for 
>> main operations and use the larger size_t just for canUseBulk 
>> (Actually that sounds better to me).
>
> Again I'm having hard time to get what follows from where. Can 
> you expand on this point please? Preferably with some code as 
> it was far easier to reason about.

A question earlier was brought up about endienness for the patch, 
that is not part of what it can handle, so:

ba[100] = 1; //then save the array
assert(ba[100]); //fails on opposite endienness when loaded

HOWEVER! if individual bits are handled in a byte array 
reference, then the above would succeed, even with the larger 
size_t ones all working with canUseBulk.

>> Besides, separating the slice suggests 'length' is not part of 
>> the union struct.
>
> How? All containers know their length.

  Yes they do, they also know where they start, thereby giving 
start/ending, which allows slicing. So why separate slicing?

>> Sure it's convertible to [0 .. length], but it's no longer 
>> convertible back to BitArray (Unless you make a whole new 
>> array and conversion/copying which takes a lot of time).
>
> Yes it's not convertible. And it's no problem and is intended. 
> Otherwise like you said it doesn't make a lot of sense.

>>> If you do that, why not do BitSlice from the beginning which 
>>> holds BitArray, but then if you do that why make them 
>>> separate at all? It doesn't make sense to me. I can see 
>>> making a separate range for the front/popFront/empty if you 
>>> really wanted to use those, but it wouldn't change how it 
>>> would work.

> Begin takes extra ulong I'm not sure it is needed in the 
> container itself.

  No it doesn't. bitfields are used to handle the beginning/ending 
offsets, equaling a total of 14 bits. It's the most confusing 
part (aside from [offset + sliceoffset + i[ formulas). canExpand 
and isCompact fill the other two bits.


More information about the Digitalmars-d-learn mailing list