Optimization tips for alpha blending / rasterization loop
Andrea Fontana
nospam at example.com
Fri Nov 22 00:50:55 PST 2013
On Friday, 22 November 2013 at 08:44:06 UTC, Andrea Fontana wrote:
> On Friday, 22 November 2013 at 03:36:38 UTC, Craig Dillabaugh
> wrote:
>> On Friday, 22 November 2013 at 02:24:56 UTC, Mikko Ronkainen
>> wrote:
>>> I'm trying to learn some software rasterization stuff. Here's
>>> what I'm doing:
>>>
>>> 32-bit DMD on 64-bit Windows
>>> Framebuffer is an int[], each int is a pixel of format
>>> 0xAABBGGRR (this seems fastest to my CPU + GPU)
>>> Framebuffer is thrown as is to OpenGL, rendered as textured
>>> quad.
>>>
>>> Here's a simple rectangle drawing algorithm that also does
>>> alpha blending. I tried quite a many variations (for example
>>> without the byte casting, using ints and shifting instead),
>>> but none was as fast as this:
>>>
>>> class Framebuffer
>>> {
>>> int[] data;
>>> int width;
>>> int height;
>>> }
>>>
>>> void drawRectangle(Framebuffer framebuffer, int x, int y, int
>>> width, int height, int color)
>>> {
>>> foreach (i; y .. y + height)
>>> {
>>> int start = x + i * framebuffer.width;
>>>
>>> foreach(j; 0 .. width)
>>> {
>>> byte* bg = cast(byte*)&framebuffer.data[start + j];
>>> byte* fg = cast(byte*)&color;
>>>
>>> int alpha = (fg[3] & 0xff) + 1;
>>> int inverseAlpha = 257 - alpha;
>>>
>>> bg[0] = cast(byte)((alpha * (fg[0] & 0xff) + inverseAlpha
>>> * (bg[0] & 0xff)) >> 8);
>>> bg[1] = cast(byte)((alpha * (fg[1] & 0xff) + inverseAlpha
>>> * (bg[1] & 0xff)) >> 8);
>>> bg[2] = cast(byte)((alpha * (fg[2] & 0xff) + inverseAlpha
>>> * (bg[2] & 0xff)) >> 8);
>>> bg[3] = cast(byte)0xff;
>>> }
>>> }
>>> }
>>>
>>> I would like to make this as fast as possible as it is done
>>> for almost every pixel every frame.
>>>
>>> Am I doing something stupid that is slowing things down?
>>> Cache trashing, or even branch prediction errors? :)
>>> Is this kind of algorith + data even a candidate for SIMD
>>> usage?
>>> Even if fg is of type byte, fg[0] would return greater value
>>> than 0xff. It needs to be (fg[0] & 0xff) to make things work.
>>> I wonder why?
>>
>> Do you want to use a ubyte instead of a byte here?
>>
>> Also, for your alpha channel:
>>
>> int alpha = (fg[3] & 0xff) + 1;
>> int inverseAlpha = 257 - alpha;
>>
>> If fg[3] = 0 then inverseAlpha = 256, which is out of the range
>> that can be stored in a ubyte.
>>
>> Craig
>
> If I'm right all of these lines:
>
> byte* fg = cast(byte*)&color;
> int alpha = (fg[3] & 0xff) + 1;
> int inverseAlpha = 257 - alpha;
>
> are constant, and you put it outside the both foreach using an
> enum;
>
> you can also pre-calculate this:
> (alpha * (fg[0] & 0xff)
>
> before foreach.
Of course I mean immutable, not enum :)
More information about the Digitalmars-d-learn
mailing list