200-600x slower Dlang performance with nested foreach loop

Tue Jan 26 23:57:43 UTC 2021

On Tuesday, 26 January 2021 at 18:17:31 UTC, H. S. Teoh wrote:
> Do not do this. Every time you call .array it allocates a new 
> array and copies all its contents over. If this code runs 
> frequently, it will cause a big performance hit, not to mention 
> high GC load.
>
> The function you're looking for is .release, not .array.

Many thanks for the tip! I look forward to trying this soon. For 
reference, the .array call is only performed once.

> That nested loop is an O(n^2) algorithm. Meaning it will slow 
> down *very* quickly as the size of the array n increases.  You 
> might want to think about how to improve this algorithm.

Nice observation, and yes, this would typically be an O(n^2) 
approach.

However, due to subsetting the input dataset to unique strings 
and then sorting in descending length, one might notice that the 
inner foreach loop does not iterate over all of n, only on the 
iterator value i+1 through the end of the array.

Thus, I believe this would then become approximately O(n^2/2). 
More precisely, it should be O( ( n^2 + n ) / 2 ).

Further: the original dataset has 64k strings. Squaring that 
yields 4.1 billion string comparisons.

Once uniquely de-duplicated, the dataset is reduced to ~46k 
strings. Considering roughly O(n^2/2) at this level, this yields 
1.06 billion string comparisons.

So, performing steps 1 through 3 improves the brute-force string 
comparison problem four-fold in my test development dataset.

> Using AA's may not necessarily improve performance.  It depends 
> on what your code does with it.  Because AA's require random 
> access to memory, it's not friendly to the CPU cache hierarchy, 
> whereas traversing linear arrays is more cache-friendly and in 
> some cases will out-perform AA's.

I figured a built-in AA might be an efficient path to performing 
unique string de-duplication. If there's a more performant method 
available, I'll certainly try it.

> First of all, you need to use a profiler to identify where the 
> hotspots are.  Otherwise you could well be pouring tons of 
> effort into "optimizing" code that doesn't actually need 
> optimizing, while completely missing the real source of the 
> problem.  Whenever you run into performance problems, do not 
> assume you know where the problem is, profile, profile, profile!

Message received. Given that D is the first compiled language 
I've semi-seriously dabbled with, I have no real experience with 
profiler usage.

> Second, you only posted a small fragment of your code, so it's 
> hard to say where the problem really is.  I can only guess 
> based on what you described.  If you could post the entire 
> program, or at least a complete, compilable and runnable 
> excerpt thereof that displays the same (or similar) performance 
> problems, then we could better help you pinpoint where the 
> problem is.

Yes, I'll be looking to present a complete, compilable, and 
executable demo of code for this issue if/when subsequent efforts 
continue to fail.

For professional reasons (because I no longer work in academia), 
I cannot share the original source code for the issue presented 
here, but I can attempt to reproduce it in a minimally complete 
form for a public dataset.