BitArray - preview of what's to come?
Era Scarecrow
rtcvb32 at yahoo.com
Wed May 30 14:22:59 PDT 2012
On Wednesday, 30 May 2012 at 19:43:32 UTC, Dmitry Olshansky wrote:
> On 30.05.2012 23:25, Era Scarecrow wrote:
> That overhead is for concatenation ? Did it ever occurred to
> you that arrays do sometimes reallocate on append. And of
> course a ~ b should produce new copy (at least semantically).
> Why bit array shouldn't follow the same design?
Yes, a ~ b of course creates a new one. a ~= true or a ~= b is
different. To do slices I did a simple start/end position pair
and it keeps the current array (Not too unlike a regular array).
When you slice it just adjusts the starting/ending location, and
'reserved' gets truncated based on the slice. Course I might be
able to drop that to a single bit meaning it was a slice or the
original, but that still has to come from somewhere, due to
alignment it would likely still be a 32/64bit size_t, unless I
have other stuff to go with it.
> I fail to see how the whole 4 byte word has to be wasted.
> Obviously you can denote 1 byte for small length. Or maximum
> 2bytes in both cases.
Truthfully only 2 bytes are wasted for the compact version (As a
flag), and 2 bytes are used for the start/ending locations.
>> On 64bit systems, you'll have 32bytes, 24 avaliable, so 192
>> bits. or 40bytes with 32, giving 256 bits. I will the current
>> to at least match the regular/normal one equally so some space
>> isn't lost, but it won't be by much.
>
> Same thought where these 8 bytes go ? Why would I pay 8 bytes
> for possible concatenation if I happen to use fixed sizes or
> resize them rarely? Pay as you go is the most sane principle.
> (built-in arrays do have append cache if you didn't know so
> they already reserve space in advance(!))
But unfortunately bit arrays aren't normal arrays. If you did
exactly the amount to fill the size_t types, then you won't have
extra space and it would have to be resized to concatenate
(assuming it could be). I can also only do what makes sense to
me. If I don't have a 'reserved' somehow, every concatenation
could require a resize/copy to ensure the slices don't overlap.
So a ~= true; a ~= true; would do 2 forced resizes/copies, which
is silly but would be required.
>> If you drop it down to a simple pointer (like the original),
>> than slices
>> and extra features and optimizations go away.
>
> Slice could be another type BTW. That type should be implicitly
> convertible to BitArray on assignment. Thus slice would have
> start/end pair of ulongs while array itself wouldn't.
> Sounds good?
Sounds workable... Although it increases the complexity a bit,
and even more functions to make just to cover the compatibility
and differences. Since the BitArray is a struct rather than a
class, if it were a class, then it would be easier to do some of
this, but the hidden overhead then becomes greater than the
struct overhead. Then there's also if you decide you want a
uneven array not divisible by size_t bits, you'd get stuck with a
slice anyways or the length would be wrong all the time.
a.length = 10;
assert(a == 10); //breaks! Length is 32/64
a.length = 32;
assert(a == 32); //okay on 32bit machines
a ~= true;
assert(a == 33); //Breaks! length is now 64/128
Choices choices.... Simpler seems
Having it implicitly convertible means any time you would want
to do a bitarray and a slice, the slice needs to reallocate so
it's beginning offset is 0 and not possible 0 - size_t*8, or
making a slice actually makes a unique copy each time, which is
it's own overhead but makes the structure smaller, or making a
slice version of a BitArray, but once again doubles complexity
and requires helper versions for all of them. The simple bitarray
suddenly is becoming very large and complex just to save a few
bytes.
When I did C programming before I was always anal about space
which I find now is rather silly. I would do something like
struct {
unsigned int a : 5;
unsigned int b : 5;
unsigned int c : 10;
unsigned int d : 2;
}
etc etc. Any time my requirements changed although I needed to
change a few bits, having the size difference between 4 bytes and
32 bytes depending on how many times the object is made seems
small and silly. I have gigs of memory free and only only a
handful of the objects made at any given time makes it seem silly.
Maybe I have it all wrong, or maybe there's much more to worry
about than originally thought.
More information about the Digitalmars-d-learn
mailing list