Reducing the cost of autodecoding

Tue Oct 25 12:16:07 PDT 2016

On 10/25/16 12:57 PM, Dmitry Olshansky wrote:
> On 10/12/16 3:53 PM, Andrei Alexandrescu wrote:
>> So we've had a good run with making popFront smaller. In ASCII
>> microbenchmarks with ldc, the speed is indistinguishable from s = s[1 ..
>> $]. Smaller functions make sure that the impact on instruction cache in
>> larger applications is not high.
> [snip]
>> Your mission, should you choose to accept it, is to define a combination
>> front/popFront that reduces the gap.
>>
>
> I'm sooo late to the party. Still a neat trick with UTF lookup table is
> to cram it into a single word with bit packing. Since we only ever
> consider top 5 bits of leading byte of which the top one can be safely
> discarded with a quick < 0x80 check. This leaves us at 4bits -> 16 entry
> table, fits nicely in a 64-bit word with 4bits per entry.

Half of the entries in your table are 0 (every char that starts with 
10). In addition, you only need 2 bits to store the length (max byte 
sequence is 4, subtract 1 from sequence to get 2 bits).

So I think you can fit this into a 16-bit word (8 entries of 2 bits 
each). You just need to pre-check for the invalid sequences (you no 
longer have to check for 0 result):

if(c < 0x80) s = s[1 .. $];
else if(c < 0xc0 || c > 0xf7) throw new Exception("blah");
else s = s[1 + myStride(c) .. $];

Should behave better on 32-bit system.

You could also store 3-bits to avoid extra addition.

-Steve