Benchmark memchar (with GCC builtins)
Iakh via Digitalmars-d
digitalmars-d at puremagic.com
Fri Oct 30 14:29:46 PDT 2015
I continue to play with SIMD. So I was trying to use std.simd
But it has lots of thing to be implemented. And I also gave up
with
core.simd.__simd due to problems with PMOVMSKB instruction (it
is not implemented).
Today I was playing with memchr for gdc:
memchr: http://www.cplusplus.com/reference/cstring/memchr/
My implementations with benchmark:
http://dpaste.dzfl.pl/4c46c0cf340c
Benchmark results:
-----
Naive: 21.9 TickDuration(136456491)
SIMD: 3.04 TickDuration(18920182)
SIMDM: 2.44 TickDuration(15232176)
SIMDU: 1.8 TickDuration(11210454)
C: 1 TickDuration(6233963)
Mid colon is duration relative to C implementation
(core.stdc.string).
memchrSIMD splits an input into three parts: unaligned begin,
unaligned end, and aligned mid.
memchrSIMDM instead of pmovmskb use this code:
------
if (Mask mask = *cast(Mask*)(result.array.ptr))
{
return ptr + bsf(mask) / BitsInByte;
}
else if (Mask mask = *cast(Mask*)(result.array.ptr +
Mask.sizeof))
{
return ptr + bsf(mask) / BitsInByte +
cast(int)Mask.sizeof;
}
------
memchrSIMDU (unaligned) applay SIMD instructions from first array
elements
SIMD part of function:
------
ubyte16 niddles;
niddles.ptr[0..16] = value;
ubyte16 result;
ubyte16 arr;
for (; ptr < alignedEnd; ptr += 16)
{
arr.ptr[0..16] = ptr[0..16];
result = __builtin_ia32_pcmpeqb128(arr, niddles);
int i = __builtin_ia32_pmovmskb128(result);
if (i != 0)
{
return ptr + bsf(i);
}
}
------
More information about the Digitalmars-d
mailing list