Optimization tips for alpha blending / rasterization loop

Andrea Fontana nospam at example.com
Fri Nov 22 00:44:05 PST 2013


On Friday, 22 November 2013 at 03:36:38 UTC, Craig Dillabaugh 
wrote:
> On Friday, 22 November 2013 at 02:24:56 UTC, Mikko Ronkainen
> wrote:
>> I'm trying to learn some software rasterization stuff. Here's 
>> what I'm doing:
>>
>> 32-bit DMD on 64-bit Windows
>> Framebuffer is an int[], each int is a pixel of format 
>> 0xAABBGGRR (this seems fastest to my CPU + GPU)
>> Framebuffer is thrown as is to OpenGL, rendered as textured 
>> quad.
>>
>> Here's a simple rectangle drawing algorithm that also does 
>> alpha blending. I tried quite a many variations (for example 
>> without the byte casting, using ints and shifting instead), 
>> but none was as fast as this:
>>
>> class Framebuffer
>> {
>>  int[] data;
>>  int width;
>>  int height;
>> }
>>
>> void drawRectangle(Framebuffer framebuffer, int x, int y, int 
>> width, int height, int color)
>> {
>>  foreach (i; y .. y + height)
>>  {
>>    int start = x + i * framebuffer.width;
>>
>>    foreach(j; 0 .. width)
>>    {
>>      byte* bg = cast(byte*)&framebuffer.data[start + j];
>>      byte* fg = cast(byte*)&color;
>>
>>      int alpha = (fg[3] & 0xff) + 1;
>>      int inverseAlpha = 257 - alpha;
>>
>>      bg[0] = cast(byte)((alpha * (fg[0] & 0xff) + inverseAlpha 
>> * (bg[0] & 0xff)) >> 8);
>>      bg[1] = cast(byte)((alpha * (fg[1] & 0xff) + inverseAlpha 
>> * (bg[1] & 0xff)) >> 8);
>>      bg[2] = cast(byte)((alpha * (fg[2] & 0xff) + inverseAlpha 
>> * (bg[2] & 0xff)) >> 8);
>>      bg[3] = cast(byte)0xff;
>>    }
>>  }
>> }
>>
>> I would like to make this as fast as possible as it is done 
>> for almost every pixel every frame.
>>
>> Am I doing something stupid that is slowing things down? Cache 
>> trashing, or even branch prediction errors? :)
>> Is this kind of algorith + data even a candidate for SIMD 
>> usage?
>> Even if fg is of type byte, fg[0] would return greater value 
>> than 0xff. It needs to be (fg[0] & 0xff) to make things work. 
>> I wonder why?
>
> Do you want to use a ubyte instead of a byte here?
>
> Also, for your alpha channel:
>
> int alpha = (fg[3] & 0xff) + 1;
> int inverseAlpha = 257 - alpha;
>
> If fg[3] = 0 then inverseAlpha = 256, which is out of the range
> that can be stored in a ubyte.
>
> Craig

If I'm right all of these lines:

byte* fg = cast(byte*)&color;
int alpha = (fg[3] & 0xff) + 1;
int inverseAlpha = 257 - alpha;

are constant, and you put it outside the both foreach using an 
enum;

you can also pre-calculate this:
(alpha * (fg[0] & 0xff)

before foreach.




More information about the Digitalmars-d-learn mailing list