Challenge: write a really really small front() for UTF8

John Colvin john.loughran.colvin at gmail.com
Mon Mar 24 09:41:01 PDT 2014


On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu 
wrote:
> Here's a baseline: http://goo.gl/91vIGc. Destroy!
>
> Andrei

On a bigendian machine with loose alignment requirements (1 
byte), you can do this, which is down to 13 instructions on x86 
(which is of course meaningless, what with it being the wrong 
endianess):

uint front(char[] s) {
   if(!(s[0] & 0b1000_0000)) return s[0]; //handle ASCII
   assert(s[0] & 0b0100_0000);
	
   if(s[0] & 0b0010_0000)
   {
     if(s[0] & 0b0001_0000)
     {
       assert(s.length >=4 && !(s[0] & 0b1000)
              && s[1] <= 0b1011_1111
              && s[2] <= 0b1011_1111
              && s[3] <= 0b1011_1111);
       return *(cast(dchar*)(s.ptr));
     }
     assert(s.length >= 3 && s[1] <= 0b1011_1111 && s[2] <= 
0b1011_1111);
     return *(cast(dchar*)(s.ptr)) >> 8;
   }
	
   assert(s.length >= 2 && s[1] <= 0b1011_1111);
   return *(cast(wchar*)(s.ptr));
}

http://goo.gl/Kf6RZJ


There may be architectures that can benefit from this.


More information about the Digitalmars-d mailing list