More on vectorized comparisons
Sean Cavanaugh
WorksOnMyMachine at gmail.com
Thu Aug 23 11:27:13 PDT 2012
On 8/22/2012 7:19 PM, bearophile wrote:
> Some time ago I have suggested to add support to vector comparisons in
> D, because this is sometimes useful and in the modern SIMD units there
> is hardware support for such operations:
>
>
> I think that code is semantically equivalent to:
>
> void main() {
> double[] a = [1.0, 1.0, -1.0, 1.0, 0.0, -1.0];
> double[] b = [10, 20, 30, 40, 50, 60];
> double[] c = [1, 2, 3, 4, 5, 6];
> foreach (i; 0 .. a.length)
> if (a[i] > 0)
> b[i] += c[i];
> }
>
>
> After that code b is:
> [11, 22, 30, 44, 50, 60]
>
>
> This means the contents of the 'then' branch of the vectorized
> comparison is done only on items of b and c where the comparison has
> given true.
>
> This looks useful. Is it possible to implement this in D, and do you
> like it?
Well, right now the binary operators == != >= <= > and < are required to
return bool instead of allowing a user defined type, which prevents a
lot of the sugar you would want to make the code nice to write. Without
the sugar the code would ends up this:
foreach(i; 0 .. a.length)
{
float4 mask = greaterThan(a[i], float4(0,0,0,0));
b[i] = select(mask, b[i] + c[i], b[i]);
}
in GPU shader land this expression is at least simpler to write:
foreach(i; 0 .. a.length)
{
b[i] = (b[i] > 0) ? (b[i] + c[i]) : b[i];
}
All of these implementations are equivalent and remove the branch from
the code flow, which is pretty nice for the CPU pipeline. In SIMD the
comparisons generate masks into a register which you can immediately
use. On modern (SSE4) CPUs the select is a single instruction, on older
ones it takes three: (mask & A) | (~mask & B), but its all better than a
real branch.
If you have a large amount of code needing a branch, you can take the
mask generated by the compare, and extract it into a CPU register, and
compare it for 0, nonzero, specific or any bits set. a float4
comparison ends up generating 4 bits, so the code with a real branch is
like:
if (any(a[i] > 0))
{
// do stuff if any of a[i] are greater than zero
}
if (all(a[i] > 0))
{
// do stuff if all of a[i] are greater than zero
}
if ((getMask(a[i] > 0) & 0x7) == 0x7)
{
// do stuff if the first three elements are greater than zero
}
More information about the Digitalmars-d
mailing list