Reducing the cost of autodecoding

Sun Oct 16 11:29:05 PDT 2016

On Sunday, 16 October 2016 at 10:05:37 UTC, Patrick Schluter 
wrote:
>
> Next step will be to loop for length 2,3,4, with or without 
> your table.

Ok now the looped version which doesn't need the lookup table. 
This one assembles in 72 lines of assembly (25 lines only for the 
exception code).

dchar myFront5(ref char[] str)
  {
    byte c0 = str[0];
    if(c0 >= 0) {
      return c0;
    }
    else {
      if(((c0!=-64) & (c0!=-63)) && c0 <= -12 ) {
	 auto l = str.length;

      int idx = 1;
      dchar mask0 = 0;
      dchar c1=c0&0x3f;
      int lim = -64;
      while(l--) {
          if(c0<lim) {
            if(c1 >0x10FFFF) break;
            return c1;
          }
          lim >>= 1 ;
          c1 = ((c1<<6) | (str.ptr[idx++]&0x3F)) & ~mask0;
          mask0 = 1 << (idx*6 + 7-idx);
        }
      }
    }
    Linvalid:
       throw new Exception("yadayada");
  }

Explanations of the tricks used:
1. the first character is read as signed byte with sign 
extension, this allows to compare it to the lim variable which is 
used to do mainly what the lookup table was doing.
2. there 2 loop escapes, if l, the variable holding the length of 
the string, is 0 which means that the string is too short or when 
the lim is bigger than the 1st character (see 1.)
3. Using the same 0x3f mask to extract the data bits from all 
bytes in the loop has the drawback of adding spurious bits coming 
from the 1st byte for 3 and 4 long sequences. The mask0 variable 
is used to "shoot" that bit in the next loop.
4. 2 simple tests allow to eliminate a lot of invalid cases 
(overlongs on 3 and 4 sequences are not tested yet though but I 
think there's a simple way of doing it but I'm too tired now to 
exlore it).