Using YMM registers causes an undefined label error
Rumbu
rumbu at rumbu.ro
Sat Mar 6 10:45:08 UTC 2021
On Friday, 5 March 2021 at 21:47:49 UTC, z wrote:
> On Friday, 5 March 2021 at 16:10:02 UTC, Rumbu wrote:
>> First of all, in 64 bit ABI, parameters are not passed on
>> stack, therefore a[RBP] is a nonsense.
>>
>> void complement32(simdbytes* a, simdbytes* b)
>>
>> a is in RCX, b is in RDX on Windows
>> a is in RDI, b is in RSI on Linux
> I'm confused, with your help i've been able to find the
> function calling convention but on LDC-generated code,
> sometimes i see the layout being reversed(The function i was
> looking at is a 7 argument function, all are pointers. The
> first argument is on the stack, the seventh and last is in RCX)
> and the offsets don't seem to make sense either(first arguemnt
> as ss:[rsp+38], second at ss:[rsp+30], and third at ss:[rsp+28])
>
>> Secondly, there is no such thing as movaps YMMX, [RAX], but
>> vmovaps YMM3, [RAX]
>> Same for vxorps, but there are 3 operands, not 2.
> You're absolutely right, but apparently it only accepts the
> two-operand version from SSE.
> Other AVX/AVX2/AVX512 instructions that have «v» prefixed
> aren't recognized either("Error: unknown opcode vmovaps"), is
> AVX(2) with YMM registers supported for «asm{}» statements?
I just made some tests, it seems that D has invented his own
calling convention. And it's not documented. If you decorate your
function with extern(C) it should respect the x86-64 ABI
conventions. This is what I got for a 7 parameters function. The
two compilers seems to do the same thing:
param no., extern(C), extern(D)
1 RCX RSP + 56
2 RDX RSP + 48
3 R8 RSP + 40
4 R9 R9
5 RSP + 40 R8
6 RSP + 48 RDX
7 RSP + 56 RCX
I would stick to extern(C), the extern(D) convention seems
completely illogical, they push the first 3 parameters on the
stack from left to right, but if there are less than 4, they use
register transfer. WTF.
Note: tested on Windows, probably on Linux both conventions will
use Linux ABI conventional registers and will not reserve 32
bytes on stack.
Now, on the other side, it seems that LDC is one step behind DMD
because - you are right - it doesn't support AVX-2 instructions
operating on ymm registers.
More information about the Digitalmars-d-learn
mailing list