(SIMD) Optimized multi-byte chunk scanning
Cecil Ward via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Sat Aug 26 09:39:18 PDT 2017
On Friday, 25 August 2017 at 18:52:57 UTC, Nordlöw wrote:
> On Friday, 25 August 2017 at 09:40:28 UTC, Igor wrote:
>> As for a nice reference of intel intrinsics:
>> https://software.intel.com/sites/landingpage/IntrinsicsGuide/
>
> Wow, what a fabulous UX!
The pcmpestri instruction is probably what you are looking for?
There is a useful resource in the Intel optimisation guide. There
is also an Intel article about speeding up XML parsing with this
instruction, but if memory serves it's really messy - a right
palaver. A lot depends on how much control, if any you have over
the input data, typically none I quite understand.
Based on this article,
https://graphics.stanford.edu/~seander/bithacks.html
I wrote a short d routine to help me learn the language as I was
thinking about faster strlen using larger-sized gulps. The above
article has a good test for whether or not a wide word contains a
particular byte value somewhere in it. I wrote a
bool hasZeroByte( in uint64_t x )
function based on that method.
I'm intending to write a friendlier d convenience routine to give
access to inline pcmpestri code generation in GDC when I get
round to it (one instruction all fully inlined and flexibly
optimised at compile-time, with no subroutine call to an
instruction).
Agner Fog's libraries and articles are superb, take a look. He
must have published code to deal with these C standard library
byte string processing functions efficiently with wide aligned
machine words, unless I'm getting very forgetful.
A bit of googling?
More information about the Digitalmars-d-learn
mailing list