Challenge: write a really really small front() for UTF8
John Colvin
john.loughran.colvin at gmail.com
Mon Mar 24 09:41:01 PDT 2014
On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu
wrote:
> Here's a baseline: http://goo.gl/91vIGc. Destroy!
>
> Andrei
On a bigendian machine with loose alignment requirements (1
byte), you can do this, which is down to 13 instructions on x86
(which is of course meaningless, what with it being the wrong
endianess):
uint front(char[] s) {
if(!(s[0] & 0b1000_0000)) return s[0]; //handle ASCII
assert(s[0] & 0b0100_0000);
if(s[0] & 0b0010_0000)
{
if(s[0] & 0b0001_0000)
{
assert(s.length >=4 && !(s[0] & 0b1000)
&& s[1] <= 0b1011_1111
&& s[2] <= 0b1011_1111
&& s[3] <= 0b1011_1111);
return *(cast(dchar*)(s.ptr));
}
assert(s.length >= 3 && s[1] <= 0b1011_1111 && s[2] <=
0b1011_1111);
return *(cast(dchar*)(s.ptr)) >> 8;
}
assert(s.length >= 2 && s[1] <= 0b1011_1111);
return *(cast(wchar*)(s.ptr));
}
http://goo.gl/Kf6RZJ
There may be architectures that can benefit from this.
More information about the Digitalmars-d
mailing list