Help with an algorithm!
CRAIG DILLABAUGH via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Thu Jun 15 08:03:48 PDT 2017
On Thursday, 15 June 2017 at 13:41:07 UTC, MGW wrote:
> On Thursday, 15 June 2017 at 13:16:24 UTC, CRAIG DILLABAUGH
> wrote:
>
> The purpose - search of changes in file system.
> Sorting is a slow operation as well as hashing. Creation of a
> tree, is equally in sorting.
> So far the best result:
>
> string[] rez;
>
> foreach(str; m2) {
> bool fFind; int j;
> foreach(int i, s; m1) {
> if(str == s) { fFind = true; j = i; break; }
> }
> if(!fFind) { rez ~= str; }
> else { m1[j] = m1[$-1]; m1.length = m1.length - 1; }
> }
>
> // rez => rezult
>
> How to parallel on thred?
radix sort is O(N) time, which is as fast as you can hope. But
given your specific problem domain (the strings are paths) an
initial radix sort step likely won't gain you much, as everything
is going to be sorted into a small subset of the buckets. So I
guess you can scrap that idea.
Knowing that your strings are actually file paths I think
building some sort of tree structure over M1 wouldn't be
unreasonable. You say go two or three levels deep on your
directory structure (ie nodes are labelled with directory name)
and use that to split M1 into buckets. If some bucket has too
many entries you could apply this recursively. Since you are only
building a constant number of levels and the number of nodes is
not likely to be too large you should do much better than N log N
* c time for this step.
Then you search with the elements of M2. You should be able to do
this in a multi-threaded way since once built, your data
structure on M1 is read-only you could just split M2 over X
threads and search. I am not an expert in this regard though, so
perhaps someone better informed than I can chime in.
Since strings will tend to have long common prefix's Ivan's Trie
idea would also work well.
More information about the Digitalmars-d-learn
mailing list