Challenge: write a really really small front() for UTF8
dnspies
dspies at ualberta.ca
Mon Mar 24 10:14:09 PDT 2014
It seems like mine wasn't being inlined because I had carelessly
replaced char[] with const(char)[] in the function signature (I
don't know why that should make a difference, but it does)
Going back to my original version (with Andre's trick to avoid
letting the loop unroll), I get
dnspies_orig 159 - 16 labels = 143 instructions
However, MFortin1 can be made to do better by taking successive
slices instead of indexing (and commenting out the check, since
it no longer helps optimize).
MFortin1_slice 137 - 13 labels = 124 instructions
http://goo.gl/JAgVK1
On Monday, 24 March 2014 at 13:39:45 UTC, Michel Fortin wrote:
> On 2014-03-24 06:08:30 +0000, "safety0ff"
> <safety0ff.dev at gmail.com> said:
>
>> Everything seems to work in your corrected versions:
>> http://dpaste.dzfl.pl/3b32535559ba
>>
>> Andrei's version doesn't compile on recent compiler versions
>> due to goto skipping initialization.
>
> Ok, so let's check what happens in actual usage. In other
> words, what happens when front is inlined and used in a loop.
> I'm counting the number of assembly lines of this main function
> for looping through each possible input using the various
> implementations:
>
> http://goo.gl/QjnA2k
>
> implementation line count of main in assembly
> andralex 285 - 9 labels = 276 instructions
> MFortin1 135 - 13 labels = 122 instructions
> MFortin2 150 - 14 labels = 136 instructions
> safety0ff -- not inlined (too big?) --
> dnspies 161 - 15 labels = 146 instructions
> CWilliams 233 - 25 labels = 208 instructions
>
> For compactness, my first implementation seems to be the
> winner, both with and without inlining.
>
> That said, I think front should only assume a non-empty char[]
> and thus should check the length before accessing anything
> after the first byte to protect against incomplete multi-byte
> sequences such as [0x1000_0000]. So I added this line to my two
> implementations:
>
> if (1+tailLength >= s.length) return dchar.init;
>
> Now lets see what happens with those changes:
>
> http://goo.gl/XPCGYE
>
> implementation line count of main in assembly
> MFortin1Check 103 - 11 labels = 92 instructions
> MFortin2Check 135 - 13 labels = 122 instructions
>
> Now I'm baffled. Adding a test makes the code shorter? It
> actually make the standalone functions longer by 8 instructions
> (as expected), but the inlined version is shorter. My guess is
> that the optimizer creates something entirely different and it
> turns out that this different version optimises better after
> inlining.
>
> That said, my test "main" function isn't very representative of
> the general case because the length is statically known by the
> optimizer. Let's see what it does when the length is not
> statically known:
>
> http://goo.gl/E2Q0Yu
>
> implementation line count of main in assembly
> andralex 384 - 9 labels = 375 instructions
> MFortin1 173 - 19 labels = 154 instructions
> MFortin2 182 - 18 labels = 164 instructions
> safety0ff -- not inlined --
> dnspies -- not inlined --
> CWilliams 229 - 23 labels = 206 instructions
> MFortin1Check 211 - 24 labels = 187 instructions
> MFortin2Check 218 - 21 labels = 197 instructions
>
> So again MFortin1 is the winner for compactness. Still, I
> maintain that we ought to at least check the length of the
> array for multi-byte sequences to protect against incomplete
> sequences that could lay at the end of the string, so I'd favor
> MFortin1Check.
More information about the Digitalmars-d
mailing list