Taking pipeline processing to the next level

Tue Sep 6 17:21:23 PDT 2016

On 7 September 2016 at 07:11, finalpatch via Digitalmars-d
<digitalmars-d at puremagic.com> wrote:
> On Tuesday, 6 September 2016 at 14:47:21 UTC, Manu wrote:
>
>>> with a main loop that reads the source buffer in *12* pixels step, call
>>> MySimpleKernel 3 times, then call AnotherKernel 4 times.
>>
>>
>> It's interesting thoughts. What did you do when buffers weren't multiple
>> of the kernels?
>
>
> The end of a scan line is special cased . If I need 12 pixels for the last
> iteration but there are only 8 left, an instance of Kernel::InputVector is
> allocated on stack, 8 remaining pixels are memcpy into it then send to the
> kernel. Output from kernel are also assigned to a stack variable first, then
> memcpy 8 pixels to the output buffer.

Right, and this is a classic problem with this sort of function; it is
only more efficient if numElements is suitable long.
See, I often wonder if it would be worth being able to provide both
functions, a scalar and array version, and have the algorithms select
between them intelligently.