Challenge: write a really really small front() for UTF8
safety0ff
safety0ff.dev at gmail.com
Sun Mar 23 21:58:08 PDT 2014
On Monday, 24 March 2014 at 04:37:23 UTC, Michel Fortin wrote:
> dchar front(char[] s)
> {
> if (s[0] < 0b1000000)
> return s[0]; // ASCII
> auto indicator = (s[0] >> 5) & 0b11;
> auto tailLength = indicator ? indicator : 1;
>
> dchar result = s[0] & (0b00111111 >> tailLength);
> foreach (i; 0..tailLength)
> result = (result << 6) | (s[1+i] & 0b00111111);
> return result;
> }
>
> (Disclaimer: not tested, but I did check that all the expected
> code paths are present in the assembly this time.)
0b1000000 is missing a zero: 0b1000_0000
Fixing that, I still get a range violation from "s[1+i]".
----- Test program -----
void main()
{
foreach (ubyte b0; 0..0x80)
{
char[] s = [b0];
assert(front(s)==front2(s));
} writeln("Single byte done");
foreach (ubyte b0; 0..0x40)
foreach (ubyte b1; 0..0x20)
{
char[] s = [0xC0|b1, 0x80|b0];
assert(front(s)==front2(s));
} writeln("Double byte done");
foreach (ubyte b0; 0..0x40)
foreach (ubyte b1; 0..0x40)
foreach (ubyte b2; 0..0x10)
{
char[] s = [0xE0|b2, 0x80|b1, 0x80|b0];
assert(front(s)==front2(s));
} writeln("Triple byte done");
foreach (ubyte b0; 0..0x40)
foreach (ubyte b1; 0..0x40)
foreach (ubyte b2; 0..0x40)
foreach (ubyte b3; 0..0x08)
{
char[] s = [0xF0|b3, 0x80|b2, 0x80|b1, 0x80|b0];
assert(front(s)==front2(s));
}
}
More information about the Digitalmars-d
mailing list