Challenge: write a really really small front() for UTF8

Michel Fortin michel.fortin at michelf.ca
Sun Mar 23 18:54:33 PDT 2014


On 2014-03-23 22:58:32 +0000, Andrei Alexandrescu 
<SeeWebsiteForEmail at erdani.org> said:

> Array bounds checking takes care of that.

If you want the smallest assembly size with array bound checking, make 
the function @safe and see the disaster it causes to the assembly size. 
That's what you have to optimize. If you're going to optimize while 
looking at the assembly, better check for bounds manually:

dchar front(char[] s)
{
  if (s.length < 1) return dchar.init;
  size_t bytesize;
  dchar result;
  switch (s[0])
  {
    case 0b00000000: .. case 0b01111111:
    	return s[0];
    case 0b11000000: .. case 0b11011111:
    	result = s[0] & 0b00011111;
    	bytesize = 2;
    	break;
    case 0b11100000: .. case 0b11101111:
    	result = s[0] & 0b00001111;
    	bytesize = 3;
    	break;
    case 0b11110000: .. case 0b11110111:
    	result = s[0] & 0b00000111;
    	bytesize = 4;
    default:
       return dchar.init;
  }
  if (s.length < bytesize) return dchar.init;
  foreach (i; 1..bytesize)
      result = (result << 6) | (s[i] & 0b00111111);
  return result;
}



-- 
Michel Fortin
michel.fortin at michelf.ca
http://michelf.ca



More information about the Digitalmars-d mailing list