BitArray/BitFields - Resumed and polishing
Era Scarecrow
rtcvb32 at yahoo.com
Wed Jan 2 18:05:05 PST 2013
On Wednesday, 2 January 2013 at 21:00:38 UTC, Dmitry Olshansky
wrote:
> 12/31/2012 9:35 PM, Era Scarecrow пишет:
> Personally I believe that if we introduce a slice of a BitArray
> as a separate range type that represents a view of array (but
> can't be appended to, etc. it's just a RA range with slicing).
Could do that I suppose, but then either exact length or
append/alignment optimizations may go out the window if we remove
some of those as options. To support those making slices part of
the main type was easy enough. But the other issues still have to
be addressed (that's mentioned below)
>
>> const BitArray x = BitArray(32);
>> func(x);
>>
>
> Then here x is passed either by ref or a copy (depending on
> func).
yeah... assuming func returned a struct of...
struct S {
BitArray x;
//other stuff
}
const(BitArray) func(const BitArray s);
in the case it's small string optimization and the original
string gets allocated OR gets passed outside the original
function that made the BitArray. So small string if possible,
convert to ref and slice it, or dup it. not 100% ref in all cases
but pretty close.
>> Would the function get passed the original compact buffer?
>> Copy of x or a dup? May not be so much an issue. However...
>>
>> BitArray x = BitArray(32);
>> const y = x[]; //slice
>>
> Then y has type BitArraySlice and references x.
Agreed, although I'm still iffy about a separate slice as a
range. Could do the range just to give it limitations, but seems
like a headache to do less and extra code to manage.
Current code I'm writing converts (if possible) the BitArray to
a allocated type, then passes that slice to y.
>> Now if y points to the actual compact buffer and we do the
>> following...
>>
>> x.length = 256; //allocates if it was compact
>>
>> Although y will have const access the data instead of a
>> compact array it's a pointer to an array and broken.
>
> Then IMO that should be a slice invalidation (akin to c++
> iterator invalidation). Depending on how slice is implemented
> it could avoid being invalidated in this case but in some
> others it surely will (e.g. shrinking).
Maybe, with the small string optimization (no allocation) it
isn't so much an easy option. That's what this is about. Say it
uses the compact (fixed string) and in opSlice...
BitArray {
bool isCompact;
bool canExpand; //represents owner for slices or compact.
union {
size_t[] normal;
size_t[2] compact;
}
BitArray opSlice() {
BitArray ba = this;
ba.canExpand = false;
if (isCompact) {
ba.isCompact = false;
ba.normal = cast(size_t[]) this.compact[0 .. $];
}
return ba;
}
}
This is where the allocation could cause a problem as the slice
could be referring to stack memory at which all the data could be
duds. Worse if you overwrite that data suddenly the allocated
memory could point to something else! I think it's safe to say
that's a bad idea.
>> If during the x call it
>
> You mean x[] call i.e. opSlice() call?
well both this(this) and opSlice() would be called at different
times so they would have different meanings (copying vs slicing).
> Seems like an overkill and semantically bad as I totally expect
> that the following to be true for any Array-like container:
> auto cont = ...;
> auto range = cont[];
> range[0] = 1;
> assert(cont[0] == 1);
>
> Also:
> foreach(v; x) --> foreach(v; x[]) if x defines opSlice
>
> however since BitArray defines opApply it's called instead so
> this one is probably OK.
Does that mean we might be able to drop opApply? Hmmm... Depends
on if we have enough implemented so the range accepts it. I'll
glance over std.range.
>> reallocated then y would work fine no matter what in that
>> case. If x can't reallocate it, then the issue remains but is
>> much smaller than before, most likely a dup would be safest.
>>
>> Thoughts? Questions? Ideas?
>
> The link to the latest code would be cool ;)
https://github.com/rtcvb32/phobos/blob/BitArray-Updates/std/bitmanip.d
There's a merge issue due to having tried to keep the bitfields
and bitarray separated; That seems to have made more problems
than solutions. Working on it but may have to wait until I've
read my Git book when it arrives before I understand this all.
> I'm thinking that the main problem is trying to mimic built-in
> arrays.
Not really, it's trying to do small string optimization. I
suppose if this was done as a class instead most of these
problems could/would go away.
> They are not ordinary containers in that they easily combine
> slice and container that is (in fact) partially hidden in
> run-time & GC implementation.
>
> The fact that combination of language features and run-time
> support makes it easy shouldn't encourage other to go this path
> in the library.
>
> For one thing this way a small string optimization is not
> achievable because of the many slices that have to reference
> the data packed in _small_ chunk. Then whenever original array
> changes (and becomes not small) any other slice to it have to
> somehow reflect this change. And it's hard if slices internally
> are exactly the same as original (if the same type).
Really depends on the use case. I'm guessing there's enough call
for the 64bit small packed bitarray that justifies it, otherwise
it would be better to throw it out. As you mention later,
separating them does seem like a better idea.
> ...
>
> Now that I tested the current behavior that we shouldn't break
> a lot. Clearly current BitArray is old and rusty D1 design.
>
> BitArray a, b, c;
> //the interesting thing is that BitArray(32) doesn't
> construct BitArray with length 32 but some explosive garbage !!
>
> c.length = 512;
> a.length = 32;
> assert(a.length == 32);
> b = a;
> b[0] = true;
> //this one is bad as it means both a & b has to reference the
> same data
> assert(a[0] == 1);
> b ~= c;
> b[1] = 1;
> //the next fails for me (it also may not fail depending on
> c's length)
> assert(a[1] == 1);
My code tests that the last two asserts fail, course, a being
under 64bits remains with small string optimization. Making it
slice... seems I'll add this to my list of tests and get it
fixed. Even if i reserved the space it still failed the last one.
Mmm...
> Currently I think that D container design have to be extended
> to include 2 kinds of containers:
>
> -small containers, values type a-la C++ STL these all use
> small container optimization (as they can cleanly)
> -big heavy-duty ones, these have reference semantics and tuned
> for large sets of values
>
> Then the proposed way out is to make 2 containers:
> - current BitArray to always have reference semantics (both
> asserts in the above always pass);
> - and BitString or BitVector to be small value semantic array
> that is always copied/duped on assign, these should be passed
> by ref or by range/slice.
>
> That's was lengthy but what do you think? And it seems like
> we'd need to bring this last topic to general D NG.
hmmm.. Either would need some code duplication, or template to
turn off/swap small string optimization. But separating them does
seem like a good idea.
More information about the Digitalmars-d-learn
mailing list