BitArray - preview of what's to come?

Wed May 30 14:22:59 PDT 2012

On Wednesday, 30 May 2012 at 19:43:32 UTC, Dmitry Olshansky wrote:
> On 30.05.2012 23:25, Era Scarecrow wrote:

> That overhead is for concatenation ? Did it ever occurred to 
> you that arrays do sometimes reallocate on append. And of 
> course a ~ b should produce new copy (at least semantically).
> Why bit array shouldn't follow the same design?

  Yes, a ~ b of course creates a new one. a ~= true or a ~= b is 
different. To do slices I did a simple start/end position pair 
and it keeps the current array (Not too unlike a regular array). 
When you slice it just adjusts the starting/ending location, and 
'reserved' gets truncated based on the slice. Course I might be 
able to drop that to a single bit meaning it was a slice or the 
original, but that still has to come from somewhere, due to 
alignment it would likely still be a 32/64bit size_t, unless I 
have other stuff to go with it.

> I fail to see how the whole 4 byte word has to be wasted. 
> Obviously you can denote 1 byte for small length. Or maximum 
> 2bytes in both cases.

Truthfully only 2 bytes are wasted for the compact version (As a 
flag), and 2 bytes are used for the start/ending locations.

>> On 64bit systems, you'll have 32bytes, 24 avaliable, so 192 
>> bits. or 40bytes with 32, giving 256 bits. I will the current 
>> to at least match the regular/normal one equally so some space 
>> isn't lost, but it won't be by much.
>
> Same thought where these 8 bytes go ? Why would I pay 8 bytes 
> for possible concatenation if I happen to use fixed sizes or 
> resize them rarely? Pay as you go is the most sane principle.
> (built-in arrays do have append cache if you didn't know so 
> they already reserve space in advance(!))

  But unfortunately bit arrays aren't normal arrays. If you did 
exactly the amount to fill the size_t types, then you won't have 
extra space and it would have to be resized to concatenate 
(assuming it could be). I can also only do what makes sense to 
me. If I don't have a 'reserved' somehow, every concatenation 
could require a resize/copy to ensure the slices don't overlap. 
So a ~= true; a ~= true; would do 2 forced resizes/copies, which 
is silly but would be required.

>> If you drop it down to a simple pointer (like the original), 
>> than slices
>> and extra features and optimizations go away.
>
> Slice could be another type BTW. That type should be implicitly 
> convertible to BitArray on assignment. Thus slice would have 
> start/end pair of ulongs while array itself wouldn't.
> Sounds good?

  Sounds workable... Although it increases the complexity a bit, 
and even more functions to make just to cover the compatibility 
and differences. Since the BitArray is a struct rather than a 
class, if it were a class, then it would be easier to do some of 
this, but the hidden overhead then becomes greater than the 
struct overhead. Then there's also if you decide you want a 
uneven array not divisible by size_t bits, you'd get stuck with a 
slice anyways or the length would be wrong all the time.

  a.length = 10;
  assert(a == 10); //breaks! Length is 32/64
  a.length = 32;
  assert(a == 32); //okay on 32bit machines
  a ~= true;
  assert(a == 33); //Breaks! length is now 64/128

  Choices choices.... Simpler seems

  Having it implicitly convertible means any time you would want 
to do a bitarray and a slice, the slice needs to reallocate so 
it's beginning offset is 0 and not possible 0 - size_t*8, or 
making a slice actually makes a unique copy each time, which is 
it's own overhead but makes the structure smaller, or making a 
slice version of a BitArray, but once again doubles complexity 
and requires helper versions for all of them. The simple bitarray 
suddenly is becoming very large and complex just to save a few 
bytes.

  When I did C programming before I was always anal about space 
which I find now is rather silly. I would do something like

struct {
  unsigned int a : 5;
  unsigned int b : 5;
  unsigned int c : 10;
  unsigned int d : 2;
}

  etc etc. Any time my requirements changed although I needed to 
change a few bits, having the size difference between 4 bytes and 
32 bytes depending on how many times the object is made seems 
small and silly. I have gigs of memory free and only only a 
handful of the objects made at any given time makes it seem silly.

  Maybe I have it all wrong, or maybe there's much more to worry 
about than originally thought.