Reducing the cost of autodecoding
Patrick Schluter via Digitalmars-d
digitalmars-d at puremagic.com
Sun Oct 16 05:48:11 PDT 2016
On Sunday, 16 October 2016 at 10:05:37 UTC, Patrick Schluter
wrote:
> On Sunday, 16 October 2016 at 08:43:23 UTC, Uplink_Coder wrote:
>> On Sunday, 16 October 2016 at 07:59:16 UTC, Patrick Schluter
>> wrote:
>
>> This looks quite slow.
>> We already have a correct version in utf.decodeImpl.
>> The goal here was to find a small and fast alternative.
>
> I know but it has to be correct before being fast.
> The code is simple and the checks can easily be removed. Here
> the version without overlong, invalid sequence and codepoint
> check.
>
> dchar myFront3(ref char[] str)
> {
> dchar c0 = str.ptr[0];
> if(c0 < 0x80) {
> return c0;
> }
> else if(str.length > 1) {
> dchar c1 = str.ptr[1];
> if(c0 < 0xE0) {
> return ((c0 & 0x1F) << 6)|(c1 & 0x3F);
> }
> else if(str.length > 2) {
> dchar c2 = str.ptr[2];
> if(c0 < 0xF0) {
> return ((c0 & 0x0F) << 12)|((c1 & 0x3F) << 6)|(c2 &
> 0x3F);
> }
> else if(str.length > 3) {
> dchar c3 = str.ptr[3];
> if(c0 < 0xF5) {
> return((c0 & 0x07) << 16)|((c1 & 0x3F) << 12)|((c2 &
> 0x3F) << 6)|(c3 & 0x3F);
Of course, this line is wrong, should shift by 18 not 16 :
return((c0 & 0x07) << 18)|((c1 & 0x3F) << 12)|((c2 &
0x3F) << 6)|(c3 & 0x3F);
> }
> }
> }
> }
> Linvalid:
> throw new Exception("yadayada");
> }
>
> Next step will be to loop for length 2,3,4, with or without
> your table.
More information about the Digitalmars-d
mailing list