Challenge: write a really really small front() for UTF8

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Sun Mar 23 19:25:17 PDT 2014


On 3/23/14, 6:53 PM, Michel Fortin wrote:
> On 2014-03-23 21:22:58 +0000, Andrei Alexandrescu
> <SeeWebsiteForEmail at erdani.org> said:
>
>> Here's a baseline: http://goo.gl/91vIGc. Destroy!
>
> Optimizing for smallest assembly size:
>
> dchar front(char[] s)
> {
>   size_t bytesize;
>   dchar result;
>   switch (s[0])
>   {
>     case 0b00000000: .. case 0b01111111:
>         return s[0];
>     case 0b11000000: .. case 0b11011111:
>         return ((s[0] & 0b00011111) << 6) | (s[1] & 0b00011111);
>     case 0b11100000: .. case 0b11101111:
>         result = s[0] & 0b00001111;
>         bytesize = 3;
>         break;
>     case 0b11110000: .. case 0b11110111:
>         result = s[0] & 0b00000111;
>         bytesize = 4;
>     default:
>        return dchar.init;
>   }
>   foreach (i; 1..bytesize)
>       result = (result << 6) | (s[i] & 0b00111111);
>   return result;
> }
>

Nice, thanks! I'd hope for a short path for the ASCII subset, could you 
achieve that?

Andrei


More information about the Digitalmars-d mailing list