Challenge: write a really really small front() for UTF8

John Colvin john.loughran.colvin at gmail.com
Mon Mar 24 09:57:57 PDT 2014


On Monday, 24 March 2014 at 16:41:02 UTC, John Colvin wrote:
> On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu 
> wrote:
>> Here's a baseline: http://goo.gl/91vIGc. Destroy!
>>
>> Andrei
>
> On a bigendian machine with loose alignment requirements (1 
> byte), you can do this

Same again, but for little-endian, 18 instructions:

http://goo.gl/jlrweQ

uint front(char[] s)
{
   if(!(s[0] & 0b1000_0000)) return s[0]; //handle ASCII
   assert(s[0] & 0b0100_0000);
	
   if(s[0] & 0b0010_0000)
   {
     if(s[0] & 0b0001_0000)
     {
       assert(s.length >=4 && !(s[0] & 0b1000)
              && s[1] <= 0b1011_1111
              && s[2] <= 0b1011_1111
              && s[3] <= 0b1011_1111);
       return swapEndian(*(cast(dchar*)(s.ptr)));
     }
     assert(s.length >= 3 && s[1] <= 0b1011_1111 && s[2] <= 
0b1011_1111);
     return swapEndian(*(cast(dchar*)(s.ptr))) >> 8;
   }
	
   assert(s.length >= 2 && s[1] <= 0b1011_1111);
   return swapEndian(*(cast(wchar*)(s.ptr)));
}


More information about the Digitalmars-d mailing list