C++Now! 2012 slides

Peter Alexander peter.alexander.au at gmail.com
Thu Jun 7 16:50:36 PDT 2012


On Thursday, 7 June 2012 at 22:29:09 UTC, Andrei Alexandrescu 
wrote:
> Great points, example could be a lot better. Maybe it's time 
> you do get started on find(). Destroy or be destroyed.

Ok...

This overload:

R find(alias pred = "a == b", R, E)(R haystack, E needle)
if (isInputRange!R &&
         is(typeof(binaryFun!pred(haystack.front, needle)) : bool))
{
     for (; !haystack.empty; haystack.popFront())
     {
         if (binaryFun!pred(haystack.front, needle)) break;
     }
     return haystack;
}


Is fine. In fact it's practically perfect. It's the obvious 
solution to a simple problem. The biggest problem is the "is 
typeof" syntax, but that's not your fault.

Second overload:

R1 find(alias pred = "a == b", R1, R2)(R1 haystack, R2 needle)
if (isForwardRange!R1 && isForwardRange!R2
         && is(typeof(binaryFun!pred(haystack.front, 
needle.front)) : bool)
         && !isRandomAccessRange!R1)
{
     static if (is(typeof(pred == "a == b")) && pred == "a == b" 
&& isSomeString!R1 && isSomeString!R2
             && haystack[0].sizeof == needle[0].sizeof)
     {
         //return cast(R1) find(representation(haystack), 
representation(needle));
         // Specialization for simple string search
         alias Select!(haystack[0].sizeof == 1, ubyte[],
                 Select!(haystack[0].sizeof == 2, ushort[], 
uint[]))
             Representation;
         // Will use the array specialization
         return cast(R1) .find!(pred, Representation, 
Representation)
             (cast(Representation) haystack, cast(Representation) 
needle);
     }
     else
     {
         return simpleMindedFind!pred(haystack, needle);
     }
}

As far as I can tell, this is a proxy for various substring 
search implementations.

Problem 1: Why is find overloaded to do substring search? Why 
does it do substring and not subset or subsequence? substring is 
a completely different algorithm from linear search and even has 
different asymptotic running time. This is needlessly overloaded, 
it adds nothing.

Problem 2: The attempted specialisation is atrocious. It compares 
the predicate string with "a == b". I just did a quick check, and 
this means that these two calls DO NOT use the same algorithm:

string a = ..., b = ...;
auto fast = find!"a == b"(a, b);
auto slow = find!"a==b"(a, b);

I have to be honest, I have only just noticed this, but that's 
really sloppy.

It's also a direct symptom of over-complicated code. As a user, I 
would 100% expect these calls to be the same. If I accidentally 
used the second version, I would have no warning: my code would 
just run slower and I'd be none the wiser. Only upon careful 
inspection of the source could you discover what this "simple" 
code does, and you would be horrified like I am now.

The next two overloads of find appears to implement a couple of 
reasonably fast substring. Again, it would take me a minute or so 
to figure out exactly what algorithm was being used.

After that there's a multi-range find. Seems simple enough, but 
it's yet another overload to consider and I'm not sure it's 
commonly used enough to even warrant existence. I'd hate to think 
what would happen if I wanted the single-range search but 
accidentally added an extra parameter.

In summary: the specialisation detection is shocking, which leads 
me to question what other static ifs are accidentally firing or 
failing. If the code was more simple, and less overloaded it 
would be easy to reason about all this, but it's not. The various 
find implementations span a few hundred lines, and all have 
numerous compile time branches and checks. The cyclomatic 
complexity must be huge.

How I'd do things:

- The first version of find would be the only one. No overloads, 
no specialisation.
- substring would be a separate, non-predicated function. It 
would be specialised for strings. I'm too tired now to think 
about the variety of range specialisations, but I'm sure there's 
a more elegant way to handle to combinations.
- Drop the variadic find altogether.
- Get rid of BoyerMooreFinder + find overload, replace with a 
boyerMoore function.



More information about the Digitalmars-d mailing list