Requesting Help with Optimizing Code

Kyle Ingraham kyle at kyleingraham.com
Thu Apr 8 16:37:57 UTC 2021


On Thursday, 8 April 2021 at 03:25:39 UTC, tsbockman wrote:
> On Thursday, 8 April 2021 at 01:24:23 UTC, Kyle Ingraham wrote:
>> The issue I have is with performance.
>
> This belongs in the "Learn" forum, I think.

In hindsight you are right here. I'll divert this sort of post 
there in the future.

> Personally, I try to stick to `static foreach` and normal 
> runtime loops for the inner loops of really CPU-intensive stuff 
> like this, so that it's easier to see from the source code what 
> the ASM will look like.

Are compilers able to take loops and parallelize them?

> The only concrete floating-point type I see mentioned in those 
> functions is `double`, which is almost always extreme overkill 
> for color calculations.

Great tip here. I'll think about the needs for the data I'm 
working on before choosing the first type on the shelf.

> Structure your loops (or functional equivalents) such that the 
> "for each pixel" loop is the outermost, or as far out as you 
> can get it. Do as much as you can with each pixel while it is 
> still in registers or L1 cache. Otherwise, you may end up 
> bottle-necked by memory bandwidth.
>
> Make sure you have optimizations enabled, especially cross 
> module inlining, -O3, the most recent SIMD instruction set you 
> are comfortable requiring (almost everyone in the developed 
> world now has Intel SSE4 or ARM Neon), and fused multiply 
> add/subtract.

Are there any sources you would recommend for learning more about 
these techniques e.g. code bases, articles etc.? I'll see how far 
I can get with creative googling.

> There will always be three primary colors throughout the entire 
> maintenance life of your code, so just go ahead and specialize 
> for that. You don't need a generic matrix multiplication 
> algorithm, for instance. A specialized 3x3 or 4x4 version could 
> be branchless and SIMD accelerated.

I had thought about this for one of my functions but didn't think 
to extend it further to a function I can use everywhere. I'll do 
that.

Thank you for taking the time to write this up. I'm sure these 
tips will go a long way for improving performance.




More information about the Digitalmars-d mailing list