color lib

Manu via Digitalmars-d digitalmars-d at puremagic.com
Sun Oct 9 06:34:29 PDT 2016


On 9 October 2016 at 18:25, Nicholas Wilson via Digitalmars-d
<digitalmars-d at puremagic.com> wrote:
> On Sunday, 9 October 2016 at 05:21:32 UTC, Manu wrote:
>>
>> On 9 October 2016 at 14:03, Nicholas Wilson via Digitalmars-d
>> <digitalmars-d at puremagic.com> wrote:
>>>
>>> How far would `r.inBatchesOf!(N)` go in terms of compiler optimisations
>>> (e.g. vectorisation) if N is a power of 2?
>>>
>>> auto inBatchesOf(size_t N,R)(R r) if(N!=0 &&isInputRange!R &&
>>> hasLength!R)
>>> {
>>>     struct InBatchesOfN
>>>     {
>>>         R r;
>>>         ElementType!(R)[N] batch;
>>>         this(R _r)
>>>         {
>>>              assert(_r.length % N ==0);// could have overloads where
>>> undefined elements == ElementType!(R).init
>>>              r = _r;
>>>              foreach( i; 0..N)
>>>              {
>>>                   batch[i] = r.front;
>>>                   r.popFront;
>>>              }
>>>         }
>>>
>>>         bool empty() { return r.empty; }
>>>         auto front { return batch; }
>>>         void popFront()
>>>         {
>>>              foreach( i; 0..N)
>>>              {
>>>                   batch[i] = r.front;
>>>                   r.popFront;
>>>              }
>>>         }
>>>     }
>>>
>>>     return InBatchesOfN(r);
>>> }
>>
>>
>> Well the trouble is the lambda that you might give to 'map' won't work
>> anymore. Operators don't work on batches, you need to use a completely
>> different API, and I think that's unfortunate.
>
>
> How?  All you need is an extra `each` e.g. r.inBatchesOf!(8).each!(a
> =>a[].map!(convertColor!RGBA8))
>
> perhaps define a helper function for it that does each + the explicit slice
> + map, but it certainly doesn't scream completely different API to me.

As you demonstrate; convertColor doesn't accept RGBA8[16], it accepts
a single RGBA8... there's no way the optimiser will be able to
magic-up an efficient inline of convertColor which works with 16
elements at a time, but I could easily write a super-fast version by
hand.

My point about the separate API is, any function that works on a
single element would need a compliment of functions that work on 'n'
elements, where 'n' is some context-specific number of elements that
suits that particular workload.
Now, that's conceivable, and it's even possible to make the magic meta
that calls these functions work out there is a batch overload and call
it if it can, but we're miles away from std.algorithm and common
ranges now.
The other issue is that every such efficient batch version would need
to be hand-written, and that sucks because there are too many
permutations.


More information about the Digitalmars-d mailing list