Challenge: write a really really small front() for UTF8

dnspies dspies at ualberta.ca
Mon Mar 24 10:14:09 PDT 2014


It seems like mine wasn't being inlined because I had carelessly 
replaced char[] with const(char)[] in the function signature (I 
don't know why that should make a difference, but it does)

Going back to my original version (with Andre's trick to avoid 
letting the loop unroll), I get
dnspies_orig   159 - 16 labels = 143 instructions

However, MFortin1 can be made to do better by taking successive 
slices instead of indexing (and commenting out the check, since 
it no longer helps optimize).
MFortin1_slice 137 - 13 labels = 124 instructions

http://goo.gl/JAgVK1

On Monday, 24 March 2014 at 13:39:45 UTC, Michel Fortin wrote:
> On 2014-03-24 06:08:30 +0000, "safety0ff" 
> <safety0ff.dev at gmail.com> said:
>
>> Everything seems to work in your corrected versions:
>> http://dpaste.dzfl.pl/3b32535559ba
>> 
>> Andrei's version doesn't compile on recent compiler versions 
>> due to goto skipping initialization.
>
> Ok, so let's check what happens in actual usage. In other 
> words, what happens when front is inlined and used in a loop. 
> I'm counting the number of assembly lines of this main function 
> for looping through each possible input using the various 
> implementations:
>
> http://goo.gl/QjnA2k
>
> implementation   line count of main in assembly
> andralex         285 -  9 labels = 276 instructions
> MFortin1         135 - 13 labels = 122 instructions
> MFortin2         150 - 14 labels = 136 instructions
> safety0ff         -- not inlined (too big?) --
> dnspies          161 - 15 labels = 146 instructions
> CWilliams        233 - 25 labels = 208 instructions
>
> For compactness, my first implementation seems to be the 
> winner, both with and without inlining.
>
> That said, I think front should only assume a non-empty char[] 
> and thus should check the length before accessing anything 
> after the first byte to protect against incomplete multi-byte 
> sequences such as [0x1000_0000]. So I added this line to my two 
> implementations:
>
>    if (1+tailLength >= s.length) return dchar.init;
>
> Now lets see what happens with those changes:
>
> http://goo.gl/XPCGYE
>
> implementation   line count of main in assembly
> MFortin1Check    103 - 11 labels =  92 instructions
> MFortin2Check    135 - 13 labels = 122 instructions
>
> Now I'm baffled. Adding a test makes the code shorter? It 
> actually make the standalone functions longer by 8 instructions 
> (as expected), but the inlined version is shorter. My guess is 
> that the optimizer creates something entirely different and it 
> turns out that this different version optimises better after 
> inlining.
>
> That said, my test "main" function isn't very representative of 
> the general case because the length is statically known by the 
> optimizer. Let's see what it does when the length is not 
> statically known:
>
> http://goo.gl/E2Q0Yu
>
> implementation   line count of main in assembly
> andralex         384 -  9 labels = 375 instructions
> MFortin1         173 - 19 labels = 154 instructions
> MFortin2         182 - 18 labels = 164 instructions
> safety0ff         -- not inlined --
> dnspies           -- not inlined --
> CWilliams        229 - 23 labels = 206 instructions
> MFortin1Check    211 - 24 labels = 187 instructions
> MFortin2Check    218 - 21 labels = 197 instructions
>
> So again MFortin1 is the winner for compactness. Still, I 
> maintain that we ought to at least check the length of the 
> array for multi-byte sequences to protect against incomplete 
> sequences that could lay at the end of the string, so I'd favor 
> MFortin1Check.



More information about the Digitalmars-d mailing list