Using YMM registers causes an undefined label error

Rumbu rumbu at rumbu.ro
Sat Mar 6 10:45:08 UTC 2021


On Friday, 5 March 2021 at 21:47:49 UTC, z wrote:
> On Friday, 5 March 2021 at 16:10:02 UTC, Rumbu wrote:
>> First of all, in 64 bit ABI, parameters are not passed on 
>> stack, therefore a[RBP] is a nonsense.
>>
>> void complement32(simdbytes* a, simdbytes* b)
>>
>> a is in RCX, b is in RDX on Windows
>> a is in RDI, b is in RSI on Linux
> I'm confused, with your help i've been able to find the 
> function calling convention but on LDC-generated code, 
> sometimes i see the layout being reversed(The function i was 
> looking at is a 7 argument function, all are pointers. The 
> first argument is on the stack, the seventh and last is in RCX) 
> and the offsets don't seem to make sense either(first arguemnt 
> as ss:[rsp+38], second at ss:[rsp+30], and third at ss:[rsp+28])
>
>> Secondly, there is no such thing as movaps YMMX, [RAX], but 
>> vmovaps YMM3, [RAX]
>> Same for vxorps, but there are 3 operands, not 2.
> You're absolutely right, but apparently it only accepts the 
> two-operand version from SSE.
> Other AVX/AVX2/AVX512 instructions that have «v» prefixed 
> aren't recognized either("Error: unknown opcode vmovaps"), is 
> AVX(2) with YMM registers supported for «asm{}» statements?


I just made some tests, it seems that D has invented his own 
calling convention. And it's not documented. If you decorate your 
function with extern(C) it should respect the x86-64 ABI 
conventions. This is what I got for a 7 parameters function. The 
two compilers seems to do the same thing:

param no., extern(C), extern(D)
1 RCX		RSP + 56
2 RDX           RSP + 48
3 R8		RSP + 40		
4 R9            R9
5 RSP + 40	R8
6 RSP + 48      RDX
7 RSP + 56      RCX

I would stick to extern(C), the extern(D) convention seems 
completely illogical, they push the first 3 parameters on the 
stack from left to right, but if there are less than 4, they use 
register transfer. WTF.

Note: tested on Windows, probably on Linux both conventions will 
use Linux ABI conventional registers and will not reserve 32 
bytes on stack.

Now, on the other side, it seems that LDC is one step behind DMD 
because - you are right - it doesn't support AVX-2 instructions 
operating on ymm registers.




More information about the Digitalmars-d-learn mailing list