Optimization tips for alpha blending / rasterization loop

Mikko Ronkainen mikoro at iki.fi
Fri Nov 22 06:55:52 PST 2013


> Do you want to use a ubyte instead of a byte here?

Yes, that was a silly mistake. It seems that fixing that removed 
the need for all the masking operations, which had the biggest 
speedup.

> Also, for your alpha channel:
>
> int alpha = (fg[3] & 0xff) + 1;
> int inverseAlpha = 257 - alpha;
>
> If fg[3] = 0 then inverseAlpha = 256, which is out of the range
> that can be stored in a ubyte.

I think my logic should be correct. The calculations are done 
with ints, and the result is then just casted/clamped to the 
byte. The reason for the +1 is the >> 8, which divides by 256.

class Framebuffer
{
   uint[] framebufferData;
   uint framebufferWidth;
   uint framebufferHeight;
}

void drawRectangle(Framebuffer framebuffer, uint x, uint y, uint 
width, uint height, uint color)
{
   immutable ubyte* fg = cast(immutable ubyte*)&color;
   immutable uint alpha = fg[3] + 1;
   immutable uint invAlpha = 257 - alpha;
   immutable uint afg0 = alpha * fg[0];
   immutable uint afg1 = alpha * fg[1];
   immutable uint afg2 = alpha * fg[2];

   foreach (i; y .. y + height)
   {
     uint start = x + i * framebuffer.width;

     foreach(j; 0 .. width)
     {
       ubyte* bg = cast(ubyte*)(&framebuffer.data[start + j]);

       bg[0] = cast(ubyte)((afg0 + invAlpha * bg[0]) >> 8);
       bg[1] = cast(ubyte)((afg1 + invAlpha * bg[1]) >> 8);
       bg[2] = cast(ubyte)((afg2 + invAlpha * bg[2]) >> 8);
       bg[3] = 0xff;
     }
   }
}

Can this be made faster with SIMD? (I don't know much about it, 
maybe the data and algorithm doesn't fit it?)

Can this be parallelized with any real gains?


More information about the Digitalmars-d-learn mailing list