Force LDC/LLVM to split a struct local (into registers)?
kinke
noone at nowhere.com
Thu Sep 19 23:14:49 UTC 2019
On Thursday, 19 September 2019 at 21:40:11 UTC, Vladimir
Panteleev wrote:
> Hi,
>
> This change increases the function's execution time by 50%:
>
> https://github.com/CyberShadow/chunker/commit/inline-hash
>
> To reproduce, get the code and run:
> [...]
Can reproduce on Win64 (after fixing up your temp file path
changes ;)). As the function is pretty long, I didn't want to
look at the changes in optimized IR (`-output-ll`), so I
experimented a bit. The following patch restores previous
performance:
@@ -374,8 +374,8 @@ struct Chunker(R)
foreach (_, b; buf[state.bpos ..
state.bmax])
{
// slide(b)
- auto out_ =
hash.window[hash.wpos];
- hash.window[hash.wpos] = b;
+ auto out_ =
state.hash.window[hash.wpos];
+ state.hash.window[hash.wpos] = b;
hash.digest ^=
ulong(tabout[out_].value);
hash.wpos++;
if (hash.wpos >= windowSize)
@@ -415,7 +415,8 @@ struct Chunker(R)
return chunk;
}
}
- state.hash = hash;
+ state.hash.wpos = hash.wpos;
+ state.hash.digest = hash.digest;
auto steps = state.bmax - state.bpos;
if (steps > 0)
I.e., not reading from and touching the 64-bytes hash.window
buffer copy in the loop, but the original buffer directly instead.
More information about the digitalmars-d-ldc
mailing list