Can you shrink it further?

Matthias Bentrup via Digitalmars-d digitalmars-d at puremagic.com
Tue Oct 11 00:30:26 PDT 2016


On Tuesday, 11 October 2016 at 04:05:47 UTC, Stefan Koch wrote:
> On Tuesday, 11 October 2016 at 03:58:59 UTC, Andrei 
> Alexandrescu wrote:
>> On 10/10/16 11:00 PM, Stefan Koch wrote:
>>> On Tuesday, 11 October 2016 at 02:48:22 UTC, Andrei 
>>> Alexandrescu wrote:
>>>>[...]
>>>
>>> If you want to skip a byte it's easy to do as well.
>>>
>>> void popFront3(ref char[] s) @trusted pure nothrow {
>>>    immutable c = s[0];
>>>    uint char_length = 1;
>>>    if (c < 127)
>>>    {
>>>    Lend :
>>>      s = s.ptr[char_length .. s.length];
>>>    } else {
>>>      if ((c & b01100_0000) == 0b1000_0000)
>>>      {
>>>        //just skip one in case this is not the beginning of a 
>>> code-point
>>> char
>>>        goto Lend;
>>>      }
>>>      if (c < 192)
>>>      {
>>>        char_length = 2;
>>>        goto Lend;
>>>      }
>>>      if (c < 240)
>>>      {
>>>        char_length = 3;
>>>        goto Lend;
>>>      }
>>>      if (c < 248)
>>>      {
>>>        char_length = 4;
>>>        goto Lend;
>>>      }
>>>    }
>>>  }
>>>
>>
>> Affirmative. That's identical to the code in "[ ... ]" :o). 
>> Generated code still does a jmp forward though. -- Andrei
>
> It was not identical.
> ((c & b01100_0000) == 0b1000_0000))
> Can be true in all of the 3 following cases.
> If we do not do a jmp to return here, we cannot guarantee that 
> we will not skip over the next valid char.
> Thereby corrupting already corrupt strings even more.
>
> For best performance we need to leave the gotos in there.

A branch-free version:

void popFront4(ref char[] s) @trusted pure nothrow {
   immutable c = s[0];
   uint char_length = 1 + (c >= 192) + (c >= 240) + (c >= 248);
   s = s.ptr[char_length .. s.length];
}

Theoretically the char_length could be computed with three sub 
and addc instructions, but no compiler is smart enough to detect 
that.


More information about the Digitalmars-d mailing list