200-600x slower Dlang performance with nested foreach loop

Sat Jan 30 23:13:56 UTC 2021

Greetings all,

Many thanks for sharing your collective perspective and advice 
thus far! It has been very helpful and instructive. I return 
bearing live data and a minimally complete, compilable, and 
executable program to experiment with and potentially optimize. 
The dataset can be pulled from here:

https://filebin.net/qf2km1ea9qgd5skp/seqs.fasta.gz?t=97kgpebg

Running "cksum" on this file:

1477520542 2199192 seqs.fasta.gz

Naturally, you'll need to gunzip this file. The decompressed file 
contains strings on every even-numbered line that have already 
been reduced to the unique de-duplicated subset, and they have 
already been sorted by descending length and alphabetical 
identity.

 From my initial post, the focus is now entirely on step #4: 
finding/removing strings that can be entirely absorbed 
(substringed) by their largest possible parent.

And now for the code:

import std.stdio : writefln, File, stdin;
import std.conv : to;
import std.string : indexOf;

void main()
{
         string[] seqs;

         foreach( line; stdin.byLine() )
         {
                 if( line[ 0 ] == '>' ) continue;
                 else seqs ~= to!string( line );
         }

         foreach( i; 0 .. seqs.length )
         {
                 if( seqs[ i ].length == 0 ) continue;

                 foreach( j; i + 1 .. seqs.length )
                 {
                         if( seqs[ j ].length == 0 ||
                             seqs[ i ].length == seqs[ j ].length 
) continue;

                         if( indexOf( seqs[ i ], seqs[ j ] ) > -1 )
                         {
                                 seqs[ j ] = "";

                                 writefln( "%s contains %s", i, j 
);
                         }
                 }
         }
}

Compile the source and then run the executable via redirecting 
stdin:

./substr < seqs.fasta

See any straightforward optimization paths here?

For curiosity, I experimented with use of string[] and ubyte[][] 
and several functions (indexOf, canFind, countUntil) to assess 
for the best potential performer. My off-the-cuff results:

string[] with indexOf() :: 26.5-27 minutes
string[] with canFind() :: >28 minutes
ubyte[][] with canFind() :: 27.5 minutes
ubyte[][] with countUntil() :: 27.5 minutes

Resultantly, the code above uses string[] with indexOf(). Tests 
were performed with an Intel(R) Xeon(R) Platinum 8259CL CPU @ 
2.50GHz.

I have additional questions/concerns/confusion surrounding the 
foreach() syntax I have had to apply above, but performance 
remains my chief immediate concern.