Optimization tips for alpha blending / rasterization loop

Craig Dillabaugh cdillaba at cg.scs.carleton.ca
Thu Nov 21 19:36:37 PST 2013


On Friday, 22 November 2013 at 02:24:56 UTC, Mikko Ronkainen
wrote:
> I'm trying to learn some software rasterization stuff. Here's 
> what I'm doing:
>
> 32-bit DMD on 64-bit Windows
> Framebuffer is an int[], each int is a pixel of format 
> 0xAABBGGRR (this seems fastest to my CPU + GPU)
> Framebuffer is thrown as is to OpenGL, rendered as textured 
> quad.
>
> Here's a simple rectangle drawing algorithm that also does 
> alpha blending. I tried quite a many variations (for example 
> without the byte casting, using ints and shifting instead), but 
> none was as fast as this:
>
> class Framebuffer
> {
>   int[] data;
>   int width;
>   int height;
> }
>
> void drawRectangle(Framebuffer framebuffer, int x, int y, int 
> width, int height, int color)
> {
>   foreach (i; y .. y + height)
>   {
>     int start = x + i * framebuffer.width;
>
>     foreach(j; 0 .. width)
>     {
>       byte* bg = cast(byte*)&framebuffer.data[start + j];
>       byte* fg = cast(byte*)&color;
>
>       int alpha = (fg[3] & 0xff) + 1;
>       int inverseAlpha = 257 - alpha;
>
>       bg[0] = cast(byte)((alpha * (fg[0] & 0xff) + inverseAlpha 
> * (bg[0] & 0xff)) >> 8);
>       bg[1] = cast(byte)((alpha * (fg[1] & 0xff) + inverseAlpha 
> * (bg[1] & 0xff)) >> 8);
>       bg[2] = cast(byte)((alpha * (fg[2] & 0xff) + inverseAlpha 
> * (bg[2] & 0xff)) >> 8);
>       bg[3] = cast(byte)0xff;
>     }
>   }
> }
>
> I would like to make this as fast as possible as it is done for 
> almost every pixel every frame.
>
> Am I doing something stupid that is slowing things down? Cache 
> trashing, or even branch prediction errors? :)
> Is this kind of algorith + data even a candidate for SIMD usage?
> Even if fg is of type byte, fg[0] would return greater value 
> than 0xff. It needs to be (fg[0] & 0xff) to make things work. I 
> wonder why?

Do you want to use a ubyte instead of a byte here?

Also, for your alpha channel:

int alpha = (fg[3] & 0xff) + 1;
int inverseAlpha = 257 - alpha;

If fg[3] = 0 then inverseAlpha = 256, which is out of the range
that can be stored in a ubyte.

Craig


More information about the Digitalmars-d-learn mailing list