Reducing the cost of autodecoding

Patrick Schluter via Digitalmars-d digitalmars-d at puremagic.com
Sun Oct 16 03:05:37 PDT 2016


On Sunday, 16 October 2016 at 08:43:23 UTC, Uplink_Coder wrote:
> On Sunday, 16 October 2016 at 07:59:16 UTC, Patrick Schluter 
> wrote:

> This looks quite slow.
> We already have a correct version in utf.decodeImpl.
> The goal here was to find a small and fast alternative.

I know but it has to be correct before being fast.
The code is simple and the checks can easily be removed. Here the 
version without overlong, invalid sequence and codepoint check.

  dchar myFront3(ref char[] str)
  {
    dchar c0 = str.ptr[0];
    if(c0 < 0x80) {
      return c0;
    }
    else if(str.length > 1) {
      dchar c1 = str.ptr[1];
      if(c0 < 0xE0) {
        return ((c0 & 0x1F) << 6)|(c1 & 0x3F);
      }
      else if(str.length > 2) {
        dchar c2 = str.ptr[2];
        if(c0 < 0xF0) {
           return ((c0 & 0x0F) << 12)|((c1 & 0x3F) << 6)|(c2 &  
0x3F);
        }
        else if(str.length > 3) {
          dchar c3 = str.ptr[3];
          if(c0 < 0xF5) {
           return((c0 & 0x07) << 16)|((c1 & 0x3F) << 12)|((c2 &  
0x3F) << 6)|(c3 & 0x3F);
          }
        }
      }
    }
    Linvalid:
       throw new Exception("yadayada");
  }

Next step will be to loop for length 2,3,4, with or without your 
table.


More information about the Digitalmars-d mailing list