faster splitter

Tue May 31 15:50:37 PDT 2016

On Tuesday, 31 May 2016 at 21:29:34 UTC, Andrei Alexandrescu 
wrote:
> You may want to then try https://dpaste.dzfl.pl/392710b765a9, 
> which generates inline code on all compilers. -- Andrei

In general, it might be beneficial to use 
ldc.intrinsics.llvm_expect (cf. __builtin_expect) for things like 
that in order to optimise basic block placement. (We should 
probably have a compiler-independent API for that in core.*, by 
the way.) In this case, the skip computation path is probably 
small enough for that not to matter much, though.

Another thing that might be interesting to do (now that you have 
a "clever" baseline) is to start counting cycles and make some 
comparisons against manual asm/intrinsics implementations. For 
short(-ish) needles, PCMPESTRI is probably the most promising 
candidate, although I suspect that for \r\n scanning in long 
strings in particular, an optimised AVX2 solution might have 
higher throughput.

Of course these observations are not very valuable without 
backing them up with measurements, but it seems like before 
optimising a generic search algorithm for short-needle test 
cases, having one's eyes on a solid SIMD baseline would be a 
prudent thing to do.

  — David