Can I get a more in-depth guide about the inline assembler?
ZILtoid1991 via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Fri Jun 3 18:44:38 PDT 2016
On Wednesday, 1 June 2016 at 23:23:49 UTC, ZILtoid1991 wrote:
> Here's the assembly code for my alpha-blending routine:
> ubyte[4] src = *cast(ubyte[4]*)(palette.ptr + 4 * *c);
> ubyte[4] *p = cast(ubyte[4]*)(workpad + (offsetX + x)*4 +
> offsetY);
> asm{ //moving the values to their destinations
> movd MM0, p;
> movd MM1, src;
> movq MM5, alpha;
> movq MM7, alphaMMXmul_const1;
> movq MM6, alphaMMXmul_const2;
> punpcklbw MM2, MM0;
> punpcklbw MM3, MM1;
>
> paddw MM6, MM5; //1 + alpha
> psubw MM7, MM5; //256 - alpha
>
> pmulhuw MM2, MM6; //src * (1 + alpha)
> pmulhuw MM3, MM7; //dest * (256 - alpha)
> paddw MM3, MM2; //(src * (1 + alpha)) + (dest * (256 - alpha))
> psrlw MM3, 8; //(src * (1 + alpha)) + (dest * (256 - alpha)) /
> 256
> //moving the result to its place;
> packuswb MM4, MM3;
> movd p, MM4;
> emms;
> }
>
> The two constants being referred here:
> static immutable ushort[4] alphaMMXmul_const1 =
> [256,256,256,256];
> static immutable ushort[4] alphaMMXmul_const2 = [1,1,1,1];
>
> alpha is a ushort[4] containing the alpha value four times.
>
> After some debugging, I found out that the p pointer becomes
> null at the end instead of pointing to a value. I have no
> experience with using in-line assemblers (although I made a few
> Hello World programs for MS-Dos with a stand-alone assembler),
> so I don't know when and how the compiler will interpret the
> types from D.
Problem solved. Current assembly code:
asm{
//moving the values to their destinations
mov EBX, p[EBP];
movd MM0, src;
movd MM1, [EBX];
movq MM5, alpha;
movq MM7, alphaMMXmul_const256;
movq MM6, alphaMMXmul_const1;
pxor MM2, MM2;
punpcklbw MM0, MM2;
punpcklbw MM1, MM2;
paddusw MM6, MM5; //1 + alpha
psubusw MM7, MM5; //256 - alpha
pmullw MM0, MM6; //src * (1 + alpha)
pmullw MM1, MM7; //dest * (256 - alpha)
paddusw MM0, MM1; //(src * (1 + alpha)) + (dest * (256 - alpha))
psrlw MM0, 8; //(src * (1 + alpha)) + (dest * (256 - alpha)) /
256
//moving the result to its place;
//pxor MM2, MM2;
packuswb MM0, MM2;
movd [EBX], MM0;
emms;
}
The actual problem was the poor documentation of MMX instructions
as it never really caught on, and the disappearance of assembly
programming from the mainstream. The end result was a quick
alpha-blending algorithm that barely has any extra performance
penalty compared to just copying the pixels. I currently have no
plans on translating the whole sprite displaying algorithm to
assembly, instead I'll work on the editor for the game engine.
More information about the Digitalmars-d-learn
mailing list