Can I get a more in-depth guide about the inline assembler?

Fri Jun 3 18:44:38 PDT 2016

On Wednesday, 1 June 2016 at 23:23:49 UTC, ZILtoid1991 wrote:
> Here's the assembly code for my alpha-blending routine:
> ubyte[4] src = *cast(ubyte[4]*)(palette.ptr + 4 * *c);
> ubyte[4] *p = cast(ubyte[4]*)(workpad + (offsetX + x)*4 + 
> offsetY);
> asm{	//moving the values to their destinations
> movd	MM0, p;
> movd	MM1, src;
> movq	MM5, alpha;
> movq	MM7, alphaMMXmul_const1;
> movq	MM6, alphaMMXmul_const2;
> 									punpcklbw	MM2, MM0;
> punpcklbw	MM3, MM1;
>
> paddw	MM6, MM5;	//1 + alpha
> psubw	MM7, MM5;	//256 - alpha
>
> pmulhuw	MM2, MM6;	//src * (1 + alpha)
> pmulhuw MM3, MM7;	//dest * (256 - alpha)
> paddw	MM3, MM2;	//(src * (1 + alpha)) + (dest * (256 - alpha))
> psrlw	MM3, 8;		//(src * (1 + alpha)) + (dest * (256 - alpha)) / 
> 256
> 									//moving the result to its place;
> 									packuswb	MM4, MM3;
> movd	p, MM4;
> emms;
> }
>
> The two constants being referred here:
> static immutable ushort[4] alphaMMXmul_const1 = 
> [256,256,256,256];
> static immutable ushort[4] alphaMMXmul_const2 = [1,1,1,1];
>
> alpha is a ushort[4] containing the alpha value four times.
>
> After some debugging, I found out that the p pointer becomes 
> null at the end instead of pointing to a value. I have no 
> experience with using in-line assemblers (although I made a few 
> Hello World programs for MS-Dos with a stand-alone assembler), 
> so I don't know when and how the compiler will interpret the 
> types from D.

Problem solved. Current assembly code:

asm{
									//moving the values to their destinations
mov		EBX, p[EBP];
movd	MM0, src;
movd	MM1, [EBX];

movq	MM5, alpha;			
movq	MM7, alphaMMXmul_const256;
movq	MM6, alphaMMXmul_const1;
pxor	MM2, MM2;
punpcklbw	MM0, MM2;
punpcklbw	MM1, MM2;

paddusw	MM6, MM5;	//1 + alpha
psubusw	MM7, MM5;	//256 - alpha

pmullw	MM0, MM6;	//src * (1 + alpha)
pmullw	MM1, MM7;	//dest * (256 - alpha)
paddusw	MM0, MM1;	//(src * (1 + alpha)) + (dest * (256 - alpha))
psrlw	MM0, 8;		//(src * (1 + alpha)) + (dest * (256 - alpha)) / 
256
									//moving the result to its place;
//pxor	MM2, MM2;
packuswb	MM0, MM2;

movd	[EBX], MM0;

emms;
}
The actual problem was the poor documentation of MMX instructions 
as it never really caught on, and the disappearance of assembly 
programming from the mainstream. The end result was a quick 
alpha-blending algorithm that barely has any extra performance 
penalty compared to just copying the pixels. I currently have no 
plans on translating the whole sprite displaying algorithm to 
assembly, instead I'll work on the editor for the game engine.