Reducing the cost of autodecoding
Patrick Schluter via Digitalmars-d
digitalmars-d at puremagic.com
Sat Oct 15 12:11:49 PDT 2016
On Saturday, 15 October 2016 at 19:07:50 UTC, Patrick Schluter
wrote:
> At least with that lookup table below, you can detect isolated
> continuation bytes (192 and 193) and invalid codes (above 244).
192 and 193 can never appear in a UTF-8 text, they are overlongs
not continuation bytes. Continuation are characters between 128
and 191 and thos are not allowed, so should be checked.
>
> __gshared static immutable ubyte[] charWidthTab = [
> 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
> 4, 4, 4, 4, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
> ];
>
> length 5 and 6 need not to be tested specifically for your goto.
More information about the Digitalmars-d
mailing list