faster splitter
David Nadlinger via Digitalmars-d
digitalmars-d at puremagic.com
Tue May 31 15:50:37 PDT 2016
On Tuesday, 31 May 2016 at 21:29:34 UTC, Andrei Alexandrescu
wrote:
> You may want to then try https://dpaste.dzfl.pl/392710b765a9,
> which generates inline code on all compilers. -- Andrei
In general, it might be beneficial to use
ldc.intrinsics.llvm_expect (cf. __builtin_expect) for things like
that in order to optimise basic block placement. (We should
probably have a compiler-independent API for that in core.*, by
the way.) In this case, the skip computation path is probably
small enough for that not to matter much, though.
Another thing that might be interesting to do (now that you have
a "clever" baseline) is to start counting cycles and make some
comparisons against manual asm/intrinsics implementations. For
short(-ish) needles, PCMPESTRI is probably the most promising
candidate, although I suspect that for \r\n scanning in long
strings in particular, an optimised AVX2 solution might have
higher throughput.
Of course these observations are not very valuable without
backing them up with measurements, but it seems like before
optimising a generic search algorithm for short-needle test
cases, having one's eyes on a solid SIMD baseline would be a
prudent thing to do.
— David
More information about the Digitalmars-d
mailing list