Improving std.algorithm.find

Sat Jul 17 15:55:26 PDT 2010

I was thinking of improving std.find. We have this bug report:

http://d.puremagic.com/issues/show_bug.cgi?id=3923

which is pretty vague but does have a point.

For starters, it should be told that std.algorithm.find _does_ a lot, 
which at least partially justifies its complexity. One thing that I 
haven't seen other find()s doing is to be able to search in one pass a 
given range for multiple other ranges, like this:

int[] a = [ 1, 4, 2, 3 ];
assert(find(a, 5, 4, 3) == tuple([ 4, 2, 3 ], 2));

When passed more than two arguments, find returns a tuple continuing the 
searched ranged positioned on the element found and the 1-based index of 
the parameter that was found. The trick is that find() makes exactly one 
pass through the searched range, which is often more efficient than 
searching the same range for each element in turn. Also the one-pass 
approach works with input ranges so it doesn't put pressure on the range 
capabilities.

However the simplified find() looks like, I'd like to keep this feature 
unless it brings serious aggravation. Right now it's the #1 factor that 
complicates find's signature and implementation.

Another aspect I'd like to discuss is use of Boyer-Moore and related 
fast finding techniques. Currently the use of B-M is explicit and a bit 
awkward. I was thinking instead to automatically detect its 
appropriatedness and use it transparently. Bearophile posted at a link 
to such a blend string search routine 
(http://effbot.org/zone/stringlib.htm) that I think can be generalized 
quite easily.

Any ideas - please chime in.

Andrei