Taking pipeline processing to the next level
finalpatch via Digitalmars-d
digitalmars-d at puremagic.com
Tue Sep 6 07:35:43 PDT 2016
On Tuesday, 6 September 2016 at 14:26:22 UTC, finalpatch wrote:
> On Tuesday, 6 September 2016 at 14:21:01 UTC, finalpatch wrote:
>> Then some template magic will figure out the LCM of the 2
>> kernels' pixel width is 3*4=12 and therefore they are fused
>> together into a composite kernel of pixel width 12. The above
>> line compiles down into a single function invokation, with a
>> main loop that reads the source buffer in 4 pixels step, call
>> MySimpleKernel 3 times, then call AnotherKernel 4 times.
>
> Correction:
> with a main loop that reads the source buffer in *12* pixels
> step, call MySimpleKernel 3 times, then call AnotherKernel 4
> times.
And of course the key to the speed is all function calls get
inlined by the compiler.
More information about the Digitalmars-d
mailing list