Requesting Help with Optimizing Code
Max Haughton
maxhaton at gmail.com
Thu Apr 8 03:58:36 UTC 2021
On Thursday, 8 April 2021 at 03:45:06 UTC, tsbockman wrote:
> On Thursday, 8 April 2021 at 03:27:12 UTC, Max Haughton wrote:
>> Although the obvious point here is vector width (you have
>> AVX-512 from what I can see, however I'm not sure if this is
>> actually a win or not on Skylake W)
>
> From what I've seen, LLVM's code generation and optimization
> for AVX-512 auto-vectorization is still quite bad and immature
> compared to AVX2 and earlier, and the wider the SIMD register
> the more that data structures and algorithms have to be
> specifically tailored to really benefit from them. Also, using
> AVX-512 instructions forces the CPU to downclock.
>
> So, I wouldn't expect much benefit from AVX-512 for the time
> being, unless you're going to hand optimize for it.
>
>> For LDC, you'll want -mcpu=native`.
>
> Only do this if you don't care about the binary working on any
> CPU but your own. Otherwise, you need to look at something like
> the Steam Hardware survey and decide what percentage of the
> market you want to capture (open the "Other Settings" section):
> https://store.steampowered.com/hwsurvey
You can do multiversioning fairly easily these days.
And AVX-512 downclocking can be quite complicated, I have seen
benchmarks where one still can achieve a decent speedup even
*with* downclocking. At very least it's worth profiling - the
reason why I brought up Skylake W specifically is that some of
the earlier ones actually emulated the 512 bit vector
instructions rather than having proper support in the function
units IIRC.
D needs finer grained control of the optimizer *inside* loops -
e.g. I don't care about inlining writeln, but if something
doesn't get inlined inside a hot loop you're fucked.
More information about the Digitalmars-d
mailing list