BitArray/BitFields - Resumed and polishing

Wed Jan 2 18:05:05 PST 2013

On Wednesday, 2 January 2013 at 21:00:38 UTC, Dmitry Olshansky 
wrote:
> 12/31/2012 9:35 PM, Era Scarecrow пишет:

> Personally I believe that if we introduce a slice of a BitArray 
> as a separate range type that represents a view of array (but 
> can't be appended to, etc. it's just a RA range with slicing).

  Could do that I suppose, but then either exact length or 
append/alignment optimizations may go out the window if we remove 
some of those as options. To support those making slices part of 
the main type was easy enough. But the other issues still have to 
be addressed (that's mentioned below)

>
>>  const BitArray x = BitArray(32);
>>  func(x);
>>
>
> Then here x is passed either by ref or a copy (depending on 
> func).

  yeah... assuming func returned a struct of...

  struct S {
    BitArray x;
    //other stuff
  }

  const(BitArray) func(const BitArray s);

  in the case it's small string optimization and the original 
string gets allocated OR gets passed outside the original 
function that made the BitArray. So small string if possible, 
convert to ref and slice it, or dup it. not 100% ref in all cases 
but pretty close.

>>  Would the function get passed the original compact buffer? 
>> Copy of x or a dup?  May not be so much an issue. However...
>>
>>  BitArray x = BitArray(32);
>>  const y = x[]; //slice
>>
> Then y has type BitArraySlice and references x.

  Agreed, although I'm still iffy about a separate slice as a 
range. Could do the range just to give it limitations, but seems 
like a headache to do less and extra code to manage.

  Current code I'm writing converts (if possible) the BitArray to 
a allocated type, then passes that slice to y.

>>  Now if y points to the actual compact buffer and we do the 
>> following...
>>
>>  x.length = 256; //allocates if it was compact
>>
>>  Although y will have const access the data instead of a 
>> compact array it's a pointer to an array and broken.
>
> Then IMO that should be a slice invalidation (akin to c++ 
> iterator invalidation). Depending on how slice is implemented 
> it could avoid being invalidated in this case but in some 
> others it surely will (e.g. shrinking).

  Maybe, with the small string optimization (no allocation) it 
isn't so much an easy option. That's what this is about. Say it 
uses the compact (fixed string) and in opSlice...

BitArray {
   bool isCompact;
   bool canExpand; //represents owner for slices or compact.
   union {
     size_t[] normal;
     size_t[2] compact;
   }
   BitArray opSlice() {
     BitArray ba = this;
     ba.canExpand = false;
     if (isCompact) {
       ba.isCompact = false;
       ba.normal = cast(size_t[]) this.compact[0 .. $];
     }

     return ba;
   }
}

  This is where the allocation could cause a problem as the slice 
could be referring to stack memory at which all the data could be 
duds. Worse if you overwrite that data suddenly the allocated 
memory could point to something else! I think it's safe to say 
that's a bad idea.

>> If during the x call it
>
> You mean x[] call i.e. opSlice() call?

  well both this(this) and opSlice() would be called at different 
times so they would have different meanings (copying vs slicing).

> Seems like an overkill and semantically bad as I totally expect 
> that the following to be true for any Array-like container:
> auto cont = ...;
> auto range = cont[];
> range[0] = 1;
> assert(cont[0] == 1);
>
> Also:
> foreach(v; x) --> foreach(v; x[]) if x defines opSlice
>
> however since BitArray defines opApply it's called instead so 
> this one is probably OK.

  Does that mean we might be able to drop opApply? Hmmm... Depends 
on if we have enough implemented so the range accepts it. I'll 
glance over std.range.

>> reallocated then y would work fine no matter what in that 
>> case. If x can't reallocate it, then the issue remains but is 
>> much smaller than before, most likely a dup would be safest.
>>
>>  Thoughts? Questions? Ideas?
>
> The link to the latest code would be cool ;)

https://github.com/rtcvb32/phobos/blob/BitArray-Updates/std/bitmanip.d

  There's a merge issue due to having tried to keep the bitfields 
and bitarray separated; That seems to have made more problems 
than solutions. Working on it but may have to wait until I've 
read my Git book when it arrives before I understand this all.

> I'm thinking that the main problem is trying to mimic built-in 
> arrays.

  Not really, it's trying to do small string optimization. I 
suppose if this was done as a class instead most of these 
problems could/would go away.

> They are not ordinary containers in that they easily combine 
> slice and container that is (in fact) partially hidden in 
> run-time & GC implementation.
>
> The fact that combination of language features and run-time 
> support makes it easy shouldn't encourage other to go this path 
> in the library.
>
> For one thing this way a small string optimization is not 
> achievable because of the many slices that have to reference 
> the data packed in _small_ chunk. Then whenever original array 
> changes (and becomes not small) any other slice to it have to 
> somehow reflect this change. And it's hard if slices internally 
> are exactly the same as original (if the same type).

  Really depends on the use case. I'm guessing there's enough call 
for the 64bit small packed bitarray that justifies it, otherwise 
it would be better to throw it out. As you mention later, 
separating them does seem like a better idea.

> ...
>
> Now that I tested the current behavior that we shouldn't break 
> a lot. Clearly current BitArray is old and rusty D1 design.
>
>     BitArray a, b, c;
>     //the interesting thing is that BitArray(32) doesn't 
> construct BitArray with length 32 but some explosive garbage !!
>
>     c.length = 512;
>     a.length = 32;
>     assert(a.length == 32);
>     b = a;
>     b[0] = true;
> //this one is bad as it means both a & b has to reference the 
> same data
>     assert(a[0] == 1);
>     b ~= c;
>     b[1] = 1;
>     //the next fails for me (it also may not fail depending on 
> c's length)
>     assert(a[1] == 1);

  My code tests that the last two asserts fail, course, a being 
under 64bits remains with small string optimization. Making it 
slice... seems I'll add this to my list of tests and get it 
fixed. Even if i reserved the space it still failed the last one. 
Mmm...

> Currently I think that D container design have to be extended 
> to include 2 kinds of containers:
>
>  -small containers, values type a-la C++ STL these all use 
> small container optimization (as they can cleanly)
>  -big heavy-duty ones, these have reference semantics and tuned 
> for large sets of values
>
> Then the proposed way out is to make 2 containers:
> - current BitArray to always have reference semantics (both 
> asserts in the above always pass);
> - and BitString or BitVector to be small value semantic array 
> that is always copied/duped on assign, these should be passed 
> by ref or by range/slice.
>
> That's was lengthy but what do you think? And it seems like 
> we'd need to bring this last topic to general D NG.

  hmmm.. Either would need some code duplication, or template to 
turn off/swap small string optimization. But separating them does 
seem like a good idea.