BitArray/BitFields - Resumed and polishing

Wed Jan 2 13:00:31 PST 2013

12/31/2012 9:35 PM, Era Scarecrow пишет:
>   As BitArray is coming back up and my resumed work I'll comment a few
> questions and suggestions and go from there. I'll polish up the code and
> try to resubmit it.

Yay!

> On Saturday, 28 July 2012 at 21:07:31 UTC, Jonathan M Davis wrote:
>> I would point out that while hasSlicing doesn't currently check for
>> it, if opSlice doesn't return a type which is assignable to the
>> original range, it won't work in a lot of Phobos functions. I keep
>> meaning to bring that up for discussion in the main newsgroup. I'd
>> argue that hasSlicing really be changed from
>
>   Hmmm.. Well Since the 'sometimes ref' as discussed won't seem to work
> well from before I'm considering to put the following two rules into
> effect.
> 1) If it's an allocated block, it stays allocated.
> 2) If you dup a BitArray and it's small enough to be compact, it converts.
> 3) If it's a small/compact array and you give out a slice, it converts
> the original block to dynamic before returning the new slice.
>
>   With 1 & 2, reserved and length work a little differently, more ref
> friendly.
>   With 3 the only issue comes up with if it's const, or a local variable.
>

Personally I believe that if we introduce a slice of a BitArray as a 
separate range type that represents a view of array (but can't be 
appended to, etc. it's just a RA range with slicing).

>   const BitArray x = BitArray(32);
>   func(x);
>

Then here x is passed either by ref or a copy (depending on func).

>   Would the function get passed the original compact buffer? Copy of x
> or a dup?
>   May not be so much an issue. However...
>
>   BitArray x = BitArray(32);
>   const y = x[]; //slice
>
Then y has type BitArraySlice and references x.

>   Now if y points to the actual compact buffer and we do the following...
>
>   x.length = 256; //allocates if it was compact
>
>   Although y will have const access the data instead of a compact array
> it's a pointer to an array and broken.

Then IMO that should be a slice invalidation (akin to c++ iterator 
invalidation). Depending on how slice is implemented it could avoid 
being invalidated in this case but in some others it surely will (e.g. 
shrinking).

> If during the x call it

You mean x[] call i.e. opSlice() call?
Seems like an overkill and semantically bad as I totally expect that the 
following to be true for any Array-like container:
auto cont = ...;
auto range = cont[];
range[0] = 1;
assert(cont[0] == 1);

Also:
foreach(v; x) --> foreach(v; x[]) if x defines opSlice

however since BitArray defines opApply it's called instead so this one 
is probably OK.

> reallocated then y would work fine no matter what in that case. If x
> can't reallocate it, then the issue remains but is much smaller than
> before, most likely a dup would be safest.
>
>   Thoughts? Questions? Ideas?

The link to the latest code would be cool ;)

I'm thinking that the main problem is trying to mimic built-in arrays.

They are not ordinary containers in that they easily combine slice and 
container that is (in fact) partially hidden in run-time & GC 
implementation.

The fact that combination of language features and run-time support 
makes it easy shouldn't encourage other to go this path in the library.

For one thing this way a small string optimization is not achievable 
because of the many slices that have to reference the data packed in 
_small_ chunk. Then whenever original array changes (and becomes not 
small) any other slice to it have to somehow reflect this change. And 
it's hard if slices internally are exactly the same as original (if the 
same type).

...

Now that I tested the current behavior that we shouldn't break a lot. 
Clearly current BitArray is old and rusty D1 design.

     BitArray a, b, c;
     //the interesting thing is that BitArray(32) doesn't construct 
BitArray with length 32 but some explosive garbage !!

     c.length = 512;
     a.length = 32;
     assert(a.length == 32);
     b = a;
     b[0] = true;
//this one is bad as it means both a & b has to reference the same data
     assert(a[0] == 1);
     b ~= c;
     b[1] = 1;
     //the next fails for me (it also may not fail depending on c's length)
     assert(a[1] == 1);

Currently I think that D container design have to be extended to include 
2 kinds of containers:

  -small containers, values type a-la C++ STL these all use small 
container optimization (as they can cleanly)
  -big heavy-duty ones, these have reference semantics and tuned for 
large sets of values

Then the proposed way out is to make 2 containers:
- current BitArray to always have reference semantics (both asserts in 
the above always pass);
- and BitString or BitVector to be small value semantic array that is 
always copied/duped on assign, these should be passed by ref or by 
range/slice.

That's was lengthy but what do you think? And it seems like we'd need to 
bring this last topic to general D NG.

-- 
Dmitry Olshansky