Reducing the cost of autodecoding

Tue Oct 25 12:32:40 PDT 2016

On Tuesday, 25 October 2016 at 19:16:07 UTC, Steven Schveighoffer 
wrote:
> On 10/25/16 12:57 PM, Dmitry Olshansky wrote:
>> On 10/12/16 3:53 PM, Andrei Alexandrescu wrote:
>>> So we've had a good run with making popFront smaller. In ASCII
>>> microbenchmarks with ldc, the speed is indistinguishable from 
>>> s = s[1 ..
>>> $]. Smaller functions make sure that the impact on 
>>> instruction cache in
>>> larger applications is not high.
>> [snip]
>>> Your mission, should you choose to accept it, is to define a 
>>> combination
>>> front/popFront that reduces the gap.
>>>
>>
>> I'm sooo late to the party. Still a neat trick with UTF lookup 
>> table is
>> to cram it into a single word with bit packing. Since we only 
>> ever
>> consider top 5 bits of leading byte of which the top one can 
>> be safely
>> discarded with a quick < 0x80 check. This leaves us at 4bits 
>> -> 16 entry
>> table, fits nicely in a 64-bit word with 4bits per entry.
>
> Half of the entries in your table are 0 (every char that starts 
> with 10). In addition, you only need 2 bits to store the length 
> (max byte sequence is 4, subtract 1 from sequence to get 2 
> bits).
>
> So I think you can fit this into a 16-bit word (8 entries of 2 
> bits each). You just need to pre-check for the invalid 
> sequences (you no longer have to check for 0 result):
>
> if(c < 0x80) s = s[1 .. $];
> else if(c < 0xc0 || c > 0xf7) throw new Exception("blah");
> else s = s[1 + myStride(c) .. $];
>
> Should behave better on 32-bit system.
>
> You could also store 3-bits to avoid extra addition.
>
> -Steve

The whole point of a LUT to begin with is to reduce instructions.