On Thursday, 13 October 2016 at 00:32:36 UTC, safety0ff wrote: > > It made little difference: LDC compiled into AVX2 vectorized > addition (vpmovzxbq & vpaddq.) Measurements without -mcpu=native: overhead 0.336s bytes 0.610s without branch hints 0.852s code pasted 0.766s