Help with an algorithm!

Thu Jun 15 08:03:48 PDT 2017

On Thursday, 15 June 2017 at 13:41:07 UTC, MGW wrote:
> On Thursday, 15 June 2017 at 13:16:24 UTC, CRAIG DILLABAUGH 
> wrote:
>
> The purpose - search of changes in file system.
> Sorting is a slow operation as well as hashing. Creation of a 
> tree, is equally in sorting.
> So far the best result:
>
> string[] rez;
>
> foreach(str; m2) {
> 	bool fFind;	int j;
> 	foreach(int i, s; m1) {
> 		if(str == s) { fFind = true; j = i; break; }
> 	}
> 	if(!fFind) { rez ~= str; }
> 	else       { 	m1[j] = m1[$-1]; m1.length = m1.length - 1; 	}
> }
> 		
> //  rez => rezult
>
> How to parallel on thred?

radix sort is O(N) time, which is as fast as you can hope. But 
given your specific problem domain (the strings are paths) an 
initial radix sort step likely won't gain you much, as everything 
is going to be sorted into a small subset of the buckets. So I 
guess you can scrap that idea.

Knowing that your strings are actually file paths I think 
building some sort of tree structure over M1 wouldn't be 
unreasonable. You say go two or three levels deep on your 
directory structure (ie nodes are labelled with directory name) 
and use that to split M1 into buckets. If some bucket has too 
many entries you could apply this recursively. Since you are only 
building a constant number of levels and the number of nodes is 
not likely to be too large you should do much better than N log N 
* c time for this step.

Then you search with the elements of M2. You should be able to do 
this in a multi-threaded way since once built, your data 
structure on M1 is read-only you could just split M2 over X 
threads and search. I am not an expert in this regard though, so 
perhaps someone better informed than I can chime in.

Since strings will tend to have long common prefix's Ivan's Trie 
idea would also work well.