BitArray/BitFields - Review

Sun Jul 29 14:36:35 PDT 2012

Era Scarecrow:

>  If it has to determine between compact and non-compact, then I 
> really don't see how any speed improvement can be made,

Maybe it's better to consider two use cases: dynamic length 
allocated on the heap (or allocated with a given allocator, often 
a stack-styled allocator), and fixed-size allocated in place, so 
there's no need for that run-time test.

> and if it's optimized it may be inlined which would be about as 
> fast as you could get.

I think that  "a[x] = 1;"  is slower than  "a.set(x);"  even if 
the array a is allocated in place and everything is inlined. This 
is probably the most important operation an array of bits has to 
support. If this is missing, I have to use my own implementation 
still.

>  Likely similar to the hamming weight table mentioned in TDPL. 
> Combined with the canUseBulk I think I could make it fairly 
> fast.

There is lot of literature about implementing this operation 
efficiently. For the first implementation a moderately fast (and 
short) code is probably enough. Later faster versions of this 
operation will go in Phobos, coming from papers.

>  Yes, currently a debate of it's partial ref/value issue is 
> coming up. As it is it's usable, and if you want to be sure 
> it's unique you can always dup it, otherwise as a previous 
> suggestion I'll either make two array types for either full 
> reference vs value, or take his separate slices (reference) 
> idea.

The idea of making two different types (for the dynamic and 
static versions) sounds nice.

There is a third use case, that is bit arrays that start small 
and can grow, and rarely grow bigger than one or two words. This 
asks for a dynamic-length bit array type that allocates locally 
only when it's small (so it needs to perform run-time test of the 
tag. Too bad the D GC doesn't support tagged pointers!). But 
probably this use case is not common enough to require a third 
type of array. So I suggest to not worry about this case now, and 
to focus on the other two more common ones.

>  A Matrix is just a multidimensional array right?

Well, yes, but there are many ways to implement arrays and nD 
arrays. Sometimes the implementations differ.

> I'll have to read up on it a bit; doing simple division (Likely 
> whole rows of 32bit at a time) would do that job,

Take a look at my implementation in Bugzilla :-) It's another 
type, the third. The 1D and 2D cases are the most common.

> but expanding width several times (compared to height) would be 
> quite a bit slower; Although not too horrible.

A 2D array of bits probably doesn't need to change its size after 
creation, this is not a common use case.

>  more likely bool array/matrix.. Although if you heavily use 
> specific lines it could convert to bool[], and then back again, 
> but if you go all over the place it would be worse than just 
> using bool[].

It's not so simple :-) A bool[] causes many more cache misses, if 
the matrix is not tiny. A bool[] or bool[][] is a good idea only 
if the matrix is very small, or if you don't have a bit matrix 
library and you don't want to write lot of code. Converting to 
bool[] the current line is not a good idea.

Bye,
bearophile