BitArray/BitFields - Resumed and polishing

Thu Jan 3 12:11:21 PST 2013

On Thursday, 3 January 2013 at 15:48:50 UTC, Dmitry Olshansky 
wrote:
> 1/3/2013 2:20 PM, Era Scarecrow wrote:
>> Suddenly it won't work and slicing is only a range and can 
>> only be used in foreach.
>
> No surprise here. Basically container != range over it. It all 
> flows from there. Range doesn't have append nor it owns any 
> data.
> About foreach see below. Yes, the BitArray container won't pass 
> hasSlicing trait, but it shouldn't as it's meant for ranges. 
> (so in a sense my comment about trying to mimic built-in array 
> applies)
>
> And slice can have opIndex and opSlice being a random access 
> range of bool.

  Hmmm.. shouldn't try to make it as close to an actual array as 
possible. I feel like I'm missing something very very simple to 
making it all correct.

> To help people convert one to the other (exactly where they 
> need it) we can add .dup(?) for the slice that allocates a 
> BitArray and assigns values from range.
>
>> Correct? You could do appending but likely it would require 
>> two functions in the original code (one for a BitArray and one 
>> for a slice).
>
> Again I don't think there is a need to append to a slice. 
> Create an array out of slice (or in fact any range of bool) 
> then append.

  I want to say you're wrong, I'm thinking heavily in the field 
where you would be doing compression or encryption where random 
access and appending could be needed at any step. I can see 
creating a simple table of the compressed sequences (Say 
huffman), and then appending the slices which then greatly 
simplifies the interface. I can also see where if you append to a 
slice it automatically making a new block of memory would be 
useful as well, as you know you're going to start fresh. I can 
make a short test set of functions and post them if you like, 
doing ranges & slices otherwise might not allow such diversity 
(simplicity?) in the code.

>> If it supports a opAssign to handle that now you have to take 
>> care of all opAssign's as well (like C++, once you handle one 
>> portion of it and opCast, you handle them all).
>
> Yes, I'm afraid that for optimal speed they both have to be 
> implemented. The version for BitArray can just forward to the 
> one for slice over the whole of it. Or there could be a common 
> template function that handles all of bit-ops over [a, b) 
> portions of BitArrays. Everything else then forwards to it.
>
>>>> Does that mean we might be able to drop opApply? Hmmm... 
>>>> Depends on if we have enough implemented so the range 
>>>> accepts it. I'll glance over std.range.
>>>
>>> Yes, it's got to be a range to work like this and then 
>>> opApply is useless.
>>
>> Maybe. Foreach if I remember right worked three different 
>> ways, and selected whichever one worked. opApply, 
>> front/pop/empty, and array (opIndex access). If enough is 
>> implemented that it recognizes it as an array then opApply can 
>> be removed.
>
> There are 2 ways: range or opApply. How built-in arrays are 
> done is up to compiler, there is no way to tap into it from 
> user level.
>
> There is an extra rule however, it is the following. If the 
> object in question is not a range and doesn't have opApply but 
> has opSlice() then it slices it first, then again it checks the 
> 2 ways on this slice - as range or opApply.

  I didn't know that; I know it attempted a slice but not that the 
slice has to be a range. Likely that needs to be emphasized in 
the documentation.

>>  A quick tests shows that currently isn't the case.
>>
>>>> Really depends on the use case. I'm guessing there's enough 
>>>> call for the 64bit small packed bitarray that justifies it, 
>>>> otherwise it would be better to throw it out. As you mention 
>>>> later, separating them does seem like a better idea.
>>>
>>> I've meant the fact that it has to preserve current std lib 
>>> behavior *and* has small string optimization that makes it 
>>> overly painful and partially defeats the optimization (by 
>>> requiring pointless compact->big conversions on slice etc.).
>>
>> We could always 'dup' on slice instead, but then you can't use 
>> references on it, at which point it's a value type.
>
> Dup on slice is very bad. Slice in container world (I'm looking 
> at std.container and this not likely to change) is a view of 
> the same data, thus it's *not* a new container on its own.
>
>> Hmmm. Being a pure value type then a separate range makes more 
>> sense to use.
>>
>>>> hmmm.. Either would need some code duplication, or template 
>>>> to turn off/swap small string optimization. But separating 
>>>> them does seem like a good idea.
>>>
>>> I'd expect a fair chunk of code to be split off into free 
>>> functions, then both containers would make use of them.
>>>
>>> Now note that there is also std.container Array!bool that has 
>>> reference semantics and got to be doing bit packing... it 
>>> could potentially take place of big BitArray mentioned.
>>
>> I'm aware of the container version, and overall if it can just 
>> replace BitArray as it is, then we might consider only 
>> worrying about a compact version, at which point it would 
>> become a template... Might start seeing BitArray!1024  in 
>> places (or BitString!1024 ?).
>
> I'm not sure if it will gain that much to have templated fixed 
> size bit array over one that just has small optimization. Could 
> be a lot but somehow I don't expect much, that needs some 
> measurement.

  I agree, but if it's going to be a value type that's heavy on 
small string optimization with as little referencing as possible, 
then it has to know what size to depend on, since resizing beyond 
the max may not be an option if we take that part out.