New UTF-8 stride function
Dmitry Olshansky
dmitry.olsh at gmail.com
Tue May 28 08:31:02 PDT 2013
28-May-2013 00:42, Martin Nowak пишет:
> On 05/27/2013 09:21 PM, Martin Nowak wrote:
>> > See unittest/benchmark here:
>> > https://gist.github.com/blackwhale/5653927
>> >
>> Looks promising.
>
> This will not detect 0xFF as invalid UTF-8 sequence.
> For sequences with 5 or 6 bytes, that aren't used for unicode, it will
> return a stride of 4.
>
First of all there is a minor bug in std.utf in a sense that it accepts
sequences of 5 and 6 bytes. They are simply explicitly not defined per
Unicode standard and should throw invalid UTF as well.
OK I just need to consider the next bit making the whole mask 4bits
wide. Thus I need 16 slots in a register.
64bit version will fit just fine in a register 4*16 = 64.
32bit version will have to go with packing 2bits per slot and doing +1
afterwards.
Here is an updated version that I'm testing again:
https://github.com/blackwhale/gsoc-bench-2012/blob/master/fast_stride.d
--
Dmitry Olshansky
More information about the Digitalmars-d
mailing list