200-600x slower Dlang performance with nested foreach loop

Tue Jan 26 17:40:36 UTC 2021

Greetings Dlang wizards,

I seek knowledge/understanding of a very frustrating phenomenon 
I've experienced over the past several days.

The problem space:

1) Read a list of strings from a file
2) De-duplicate all strings into the subset of unique strings
3) Sort the subset of unique strings by descending length and 
then by ascending lexicographic identity
4) Iterate through the sorted subset of unique strings, 
identifying smaller sequences with perfect identity to their 
largest possible parent string

I have written a Dlang program that performantly progresses 
through step #3 above. I used a built-in AA (associative array) 
to uniquely de-duplicate the initial set of strings and then used 
multiSort(). Performance was good up till this point, especially 
with use of the LDC compiler.

Things went sideways at step #4: because multiSort() returns a 
SortedRange, I used .array to convert the returned SortedRange 
into an array of type string[]. This appeared to work, and 
neither DMD nor LDC threw any warnings/errors for doing this.

With the formally returned array, I then attempted to construct a 
double foreach loop to iterate through the sorted array of unique 
strings and find substring matches.

foreach( i, ref pStr; sortedArr )
{
     foreach( j, ref cStr; sortedArr[ i + 1 .. $ ] )
     {
         if( indexOf( pStr, cStr ) > -1 )
         {
             // ...
         }
     }
}

Before adding the code excerpt above, the Dlang program was 
taking ~1 second on an input file containing approx. 64,000 
strings.

By adding the code above, the program now takes 6 minutes to 
complete. An attempt was made to more efficiently perform 
ASCII-only substring searching by converting the sorted string[] 
into ubyte[][] and then using countUntil() instead of indexOf(), 
but this had an effect that was completely opposite to what I had 
previously experienced: the program then took over 20 minutes to 
complete!

Thus, I am entirely baffled.

My first attempt to solve this problem space used a small Perl 
program to perform steps 1 through 3, which would then pipe 
intermediate output to a small Dlang program handling only step 
#4 using dynamic arrays (no use of AAs) of ubyte[][] with use of 
countUntil().

The Dlang code for the nested foreach block above is essentially 
near-identical between my two Dlang implementations. Yet, the 
second implementation--where I'm trying to solve the entire 
problem space in D--has absolutely failed in terms of performance.

Perl+D, ubyte[][], countUntil() :: under 2 seconds
only D, string[], indexOf() :: ~6 minutes
only D, ubyte[][], countUntil() :: >20 minutes

Please advise. This nightmarish experience is shaking my 
confidence in using D.