Reducing the cost of autodecoding

Wed Oct 12 18:36:44 PDT 2016

On 10/12/2016 09:35 PM, Stefan Koch wrote:
> On Thursday, 13 October 2016 at 01:27:35 UTC, Andrei Alexandrescu wrote:
>> On 10/12/2016 08:41 PM, safety0ff wrote:
>>> On Thursday, 13 October 2016 at 00:32:36 UTC, safety0ff wrote:
>>>>
>>>> It made little difference: LDC compiled into AVX2 vectorized addition
>>>> (vpmovzxbq & vpaddq.)
>>>
>>> Measurements without -mcpu=native:
>>> overhead 0.336s
>>> bytes    0.610s
>>> without branch hints 0.852s
>>> code pasted 0.766s
>>
>> So we should be able to reduce overhead by means of proper code
>> arrangement and interplay of inlining and outlining. The prize,
>> however, would be to get the AVX instructions for ASCII going. Is that
>> possible? -- Andrei
>
> AVX for ascii ?
> What are you referring to ?
> Most text processing is terribly incompatible with simd.
> sse 4.2 has a few instructions that do help, but as far as I am aware it
> is not yet too far spread.

Oh ok, so it's that checksum in particular that got optimized. Bad 
benchmark! Bad! -- Andrei