Requesting Help with Optimizing Code
Kyle Ingraham
kyle at kyleingraham.com
Thu Apr 8 16:37:57 UTC 2021
On Thursday, 8 April 2021 at 03:25:39 UTC, tsbockman wrote:
> On Thursday, 8 April 2021 at 01:24:23 UTC, Kyle Ingraham wrote:
>> The issue I have is with performance.
>
> This belongs in the "Learn" forum, I think.
In hindsight you are right here. I'll divert this sort of post
there in the future.
> Personally, I try to stick to `static foreach` and normal
> runtime loops for the inner loops of really CPU-intensive stuff
> like this, so that it's easier to see from the source code what
> the ASM will look like.
Are compilers able to take loops and parallelize them?
> The only concrete floating-point type I see mentioned in those
> functions is `double`, which is almost always extreme overkill
> for color calculations.
Great tip here. I'll think about the needs for the data I'm
working on before choosing the first type on the shelf.
> Structure your loops (or functional equivalents) such that the
> "for each pixel" loop is the outermost, or as far out as you
> can get it. Do as much as you can with each pixel while it is
> still in registers or L1 cache. Otherwise, you may end up
> bottle-necked by memory bandwidth.
>
> Make sure you have optimizations enabled, especially cross
> module inlining, -O3, the most recent SIMD instruction set you
> are comfortable requiring (almost everyone in the developed
> world now has Intel SSE4 or ARM Neon), and fused multiply
> add/subtract.
Are there any sources you would recommend for learning more about
these techniques e.g. code bases, articles etc.? I'll see how far
I can get with creative googling.
> There will always be three primary colors throughout the entire
> maintenance life of your code, so just go ahead and specialize
> for that. You don't need a generic matrix multiplication
> algorithm, for instance. A specialized 3x3 or 4x4 version could
> be branchless and SIMD accelerated.
I had thought about this for one of my functions but didn't think
to extend it further to a function I can use everywhere. I'll do
that.
Thank you for taking the time to write this up. I'm sure these
tips will go a long way for improving performance.
More information about the Digitalmars-d
mailing list