Is sorted using SIMD instructions

Marco Leise Marco.Leise at gmx.de
Thu Apr 12 10:04:13 UTC 2018


Am Thu, 12 Apr 2018 09:37:58 +0000
schrieb Stefan Koch <uplink.coder at googlemail.com>:

> On Thursday, 12 April 2018 at 07:25:27 UTC, Per Nordlöw wrote:
> > Neither GCC, LLVM nor ICC can auto-vectorize (and use SIMD) the 
> > seemly simple function
> >
> > bool is_sorted(const int32_t* input, size_t n) {
> >     if (n < 2) {
> >         return true;
> >     }
> >
> >     for (size_t i=0; i < n - 1; i++) {
> >         if (input[i] > input[i + 1])
> >             return false;
> >     }
> >
> >     return true;
> > }
> >
> > Can D's compilers do better?
> >
> > See http://0x80.pl/notesen/2018-04-11-simd-is-sorted.html  
> 
> I highly doubt it.
> this cannot be auto-vectorized.
> There is no way to proof the loop dimensions.
> 
> I'd be a different story if you unrolled the loop by hand.
> I guess then you'd see gcc and clang putting some simd in there 
> maybe.

I had it triggered in gcc one day, when I changed from 3 ubyte
color components to 4 in a struct and I do believe in both
cases it was padded to 4 bytes. One should probably read a bit
into what the respective backends can detect and modify code
to match that. D compilers are ultimately limited by what the
backends offer in terms of auto-vectorizing non-SIMD code.

It is easy to vectorize a loop without pointer aliasing that
performs C=A+B, but your code above requires complex logic.
There is no SIMD instruction in the SSE series that would
check if an array is sorted as far as I know. Instead you'd
load one vector starting at index 0 and another starting at
index 1 of the same "input" array and compare them
piece-wise. That's an difficult problem for the compiler:
* one of the vectors is always going to be an unaligned load,
  which may be degrade performance
* most of the data is loaded twice
You could alternatively load only one vector and shuffle that
to compare N-1 items at a time, but at that point it feels
like asking the compiler to automatically create source code
for your problem. As if one just says: "Hey, I need to print
all primes up to 100." And the compiler understands what to do.

-- 
Marco



More information about the Digitalmars-d mailing list