Reducing the cost of autodecoding
Dmitry Olshansky via Digitalmars-d
digitalmars-d at puremagic.com
Tue Oct 25 09:57:34 PDT 2016
On 10/12/16 3:53 PM, Andrei Alexandrescu wrote:
> So we've had a good run with making popFront smaller. In ASCII
> microbenchmarks with ldc, the speed is indistinguishable from s = s[1 ..
> $]. Smaller functions make sure that the impact on instruction cache in
> larger applications is not high.
[snip]
> Your mission, should you choose to accept it, is to define a combination
> front/popFront that reduces the gap.
>
I'm sooo late to the party. Still a neat trick with UTF lookup table is
to cram it into a single word with bit packing. Since we only ever
consider top 5 bits of leading byte of which the top one can be safely
discarded with a quick < 0x80 check. This leaves us at 4bits -> 16 entry
table, fits nicely in a 64-bit word with 4bits per entry.
// construct in-word lookup table
ulong bitTable()
{
import std.utf;
ulong table;
for(size_t i = 128; i<256; i+= 8)
{
char[1] c = [ cast(char)i ];
try{
ulong s = std.utf.stride(c[0..1], 0);
table |= s<<(4*((i-128)>>3));
}
catch(Exception){
// keep zeros
}
}
return table;
}
uint myStride()(char c)
{
enum table = bitTable;
return (table >> ((c & 0x7f) >> 3)) & 0xF;
}
void myPopFront()(ref char[] s)
{
char c = s[0];
if(c < 0x80)
s = s[1..$];
else {
uint step = myStride(c);
if(!step) throw new Exception("blah");
s = s[step..$];
}
}
Accordingly myFront could be modified to use the same technique.
Can't check perf with LDC at the moment sadly.
>
> Andrei
More information about the Digitalmars-d
mailing list