Reducing the cost of autodecoding

Patrick Schluter via Digitalmars-d digitalmars-d at puremagic.com
Sun Oct 16 05:48:11 PDT 2016


On Sunday, 16 October 2016 at 10:05:37 UTC, Patrick Schluter 
wrote:
> On Sunday, 16 October 2016 at 08:43:23 UTC, Uplink_Coder wrote:
>> On Sunday, 16 October 2016 at 07:59:16 UTC, Patrick Schluter 
>> wrote:
>
>> This looks quite slow.
>> We already have a correct version in utf.decodeImpl.
>> The goal here was to find a small and fast alternative.
>
> I know but it has to be correct before being fast.
> The code is simple and the checks can easily be removed. Here 
> the version without overlong, invalid sequence and codepoint 
> check.
>
>  dchar myFront3(ref char[] str)
>  {
>    dchar c0 = str.ptr[0];
>    if(c0 < 0x80) {
>      return c0;
>    }
>    else if(str.length > 1) {
>      dchar c1 = str.ptr[1];
>      if(c0 < 0xE0) {
>        return ((c0 & 0x1F) << 6)|(c1 & 0x3F);
>      }
>      else if(str.length > 2) {
>        dchar c2 = str.ptr[2];
>        if(c0 < 0xF0) {
>           return ((c0 & 0x0F) << 12)|((c1 & 0x3F) << 6)|(c2 &  
> 0x3F);
>        }
>        else if(str.length > 3) {
>          dchar c3 = str.ptr[3];
>          if(c0 < 0xF5) {
>           return((c0 & 0x07) << 16)|((c1 & 0x3F) << 12)|((c2 &  
> 0x3F) << 6)|(c3 & 0x3F);

Of course, this line is wrong, should shift by 18 not 16 :
            return((c0 & 0x07) << 18)|((c1 & 0x3F) << 12)|((c2 &
  0x3F) << 6)|(c3 & 0x3F);

>          }
>        }
>      }
>    }
>    Linvalid:
>       throw new Exception("yadayada");
>  }
>
> Next step will be to loop for length 2,3,4, with or without 
> your table.




More information about the Digitalmars-d mailing list