Opportunistic worst-case complexity upgrade? aka binary search with `find`
Jakob Ovrum via Digitalmars-d
digitalmars-d at puremagic.com
Wed Jul 23 18:21:43 PDT 2014
We should talk about a design question surrounding binary search
with `canFind`/`find` and possibly other linear-search functions.
Currently we have binary search in Phobos as part of
std.range.SortedRange. Its interface is not compatible with
`canFind` or `find` - you can't simply wrap the haystack in a
SortedRange and pass it to an algorithm to give you logarithmic
complexity.
The first question is whether this opportunistic upgrade is
desirable - binary search has much better worst-case complexity
than linear search, but it's not necessarily faster in the
average case, which depends on the specific use pattern. One
important thing to note is that, assuming the binary-search
specialization is documented, the user can use
`SortedRange.release` to explicitly request linear search.
Myself and others have sometimes mistakenly expected `find` and
friends to be specialized for `SortedRange` inputs,
opportunistically providing better worst-case complexity, but
this is not the case. It seems simple at first glance, but the
problem lies in the predicate - binary search can only be
leveraged when the specific order is known:
---
auto data = assumeSorted([1, 2, 3]);
// Equality, so bsearch then trot left
auto result = data.find!((x, e) => x == e)(2); // default
predicate
assert(result.equal([2, 3]));
// Opposite order and exclusive, bsearch then trot right
result = data.find!((x, e) => x > e)(2);
assert(result.equal([3]));
// Same order, bsearch then trot left.
// Compare first element as an optimization?
result = data.find!((x, e) => x < e)(0);
assert(result.empty);
struct S { string name; }
auto data2 = assumeSorted!((a, b) => a.name < b.name)(
[S("a"), S("b"), S("c")]
);
// Same order and exclusive, but with a transformation, yikes...
auto result2 = data2.find!((x, e) => x.name < e.name)(S("b"));
assert(result2.equal(data2));
---
Identifying the characteristics described in the code comments
above is the biggest problem: predicate functions don't have any
notion of equality.
String lambdas can be compared for equality, but they're really
fragile: "a == b" != "b == a" etc. Besides, string lambdas are
undesirable for other reasons and should be phased out in the
long-term[1].
Someone suggested defining a standard set of predicates, making
it look like this:
---
auto data = assumeSorted!less([1, 2, 3]); // Would be default
// Equality, so bsearch then trot left
auto result = data.find!equality(2); // Would be default
---
Templates can be compared with __traits(isSame, ...), so this
approach seems feasible. If we want to do this, this seems like
the most realistic approach. I'm not sure if it can be made to
work when transformation of arguments is involved, but it might
still be worth doing.
Another issue is that SortedRange's interface does not actually
support all the above patterns because it splits the result into
3 instead of 2; we would need to amend SortedRange to support the
inverse functions of lowerBound and upperBound.
So, what does the community think? Desirable, or not? Thoughts
about implementation?
[1]
http://forum.dlang.org/post/jnlqesrwxfekdsxjerlp@forum.dlang.org
More information about the Digitalmars-d
mailing list