Force LDC/LLVM to split a struct local (into registers)?

kinke noone at nowhere.com
Thu Sep 19 23:14:49 UTC 2019


On Thursday, 19 September 2019 at 21:40:11 UTC, Vladimir 
Panteleev wrote:
> Hi,
>
> This change increases the function's execution time by 50%:
>
> https://github.com/CyberShadow/chunker/commit/inline-hash
>
> To reproduce, get the code and run:
> [...]

Can reproduce on Win64 (after fixing up your temp file path 
changes ;)). As the function is pretty long, I didn't want to 
look at the changes in optimized IR (`-output-ll`), so I 
experimented a bit. The following patch restores previous 
performance:

@@ -374,8 +374,8 @@ struct Chunker(R)
                         foreach (_, b; buf[state.bpos .. 
state.bmax])
                         {
                                 // slide(b)
-                               auto out_ = 
hash.window[hash.wpos];
-                               hash.window[hash.wpos] = b;
+                               auto out_ = 
state.hash.window[hash.wpos];
+                               state.hash.window[hash.wpos] = b;
                                 hash.digest ^= 
ulong(tabout[out_].value);
                                 hash.wpos++;
                                 if (hash.wpos >= windowSize)
@@ -415,7 +415,8 @@ struct Chunker(R)
                                         return chunk;
                                 }
                         }
-                       state.hash = hash;
+                       state.hash.wpos = hash.wpos;
+                       state.hash.digest = hash.digest;

                         auto steps = state.bmax - state.bpos;
                         if (steps > 0)

I.e., not reading from and touching the 64-bytes hash.window 
buffer copy in the loop, but the original buffer directly instead.


More information about the digitalmars-d-ldc mailing list