512 bit static array to vector

Bruce Carneal bcarneal at gmail.com
Sun Jun 19 19:43:29 UTC 2022


On Sunday, 19 June 2022 at 16:56:17 UTC, kinke wrote:
> On Sunday, 19 June 2022 at 14:13:45 UTC, Bruce Carneal wrote:
>> 1) LDC requires more instructions at 512 bits. At 256 
>> (x86-64-v3) they're the same.
>
> Different results (actually using a 512-bit move, not 2x256) 
> with `-mattr=avx512bw`. I guess LLVM makes performance 
> assumptions for the provided CPU and prefers 256-bit 
> instructions.

Note that llvm/ldc chooses a 512 bit 2 instruction ld/st sequence 
for a2vUnion given x86-64-v4 as the target but goes for a 256 bit 
wide 4 instruction ld/st sequence in a2vArray.

As you note, -mattr=avx512bw forces a2vArray into the 2 
instruction form but apparently some difference in the IR 
presented to LLVM? enables the choice of the shorter sequence for 
a2vUnion in either case.

Just curious.  Thanks for your having taken a look and for 
highlighting the workaround (specify avx512bw explicitly).



More information about the digitalmars-d-ldc mailing list