The rfind challenge

Mon Jan 14 23:20:31 PST 2013

On Tuesday, 15 January 2013 at 05:51:16 UTC, Andrei Alexandrescu 
wrote:
> (Warning: this assumes a fair bit of background with ranges and 
> the STL.)
>
> We've had a long-standing weakness of ranges when compared with 
> STL iterators: in a find() call we get to access the balance of 
> the range from the found position (if any) through the end of 
> the initial range. In contrast, STL's find returns an iterator 
> that can be combined with the beginning of the initial range or 
> the end of the initial range, thus offering two ranges instead 
> of one.
>
> D offers the findSplit family which partially deals with this, 
> but in an arguably imperfect way because the range from the 
> beginning of the initial range to the found element has a 
> different type than the initial range.
>
> In spite of that, things have worked out well.
>
> Nowadays I've been thinking (inspired by an exchange on a C=+ 
> mailing list) that STL's rfind (http://goo.gl/Btg8s) is a real 
> challenge for D's ranges.
>
> In D, r.rfind(e) should return a range starting with the last 
> occurrence of e in r and spanning through the rest of r.
>
> If r is a forward range but not better, this is rather simple:
>
> R rfind(R, E)(R range, E element)
> {
>     for (;;)
>     {
>         auto ahead = range.save.find(element);
>         if (ahead.empty) return range;
>         range = ahead;
>     }
> }

I'd hardly call that an ideal solution >:( The point of rfind is 
to not start the search from the start of the string. What if 
your input was:
//----
string mobyDick = ...:
auto r = rfind(mobyDick, ishmael); This may take a while...
//----

That, and it would fail with rfind("A_bc_bc_bc_A", "bc_bc"): It 
would find "bc_bc_bc_A" instead of "bc_bc_A".

The root cause is that a bidirectional range just can't be 
"cut"/"sliced" at a given point, whereas iterators allow this.

As much as I love ranges, I think this is a fundamental flaw for 
anything less that hasSlicing.

So far, the problem has stayed under the radar, thanks (because?) 
of take, but the problems it brings are clear:
-Bias towards front iteration.
-Type system warping.
-Loss of bidirectionality when taken.

Algorithms suffer because of this. Containers suffer because of 
this.

I don't have any real solution to offer (there'd be a real 
complexity cost to any proposal), but I think there *needs* to be 
a way to provide "inter" slicing of ranges. Something along the 
lines of:
//----
bidir a = [1, 2, 3, 4, 5, 6];
bidir b = a.drop(2).dropBack(2); //[3, 4];
auto l = cutBefore(a, b); //Gets the part of a that is before b
auto r = cutAfter(a, b); //Gets the part of a that is after b
static assert(typeof(a) == typeof(l));
static assert(typeof(a) == typeof(r));
assert(equal(l), [1, 2]);
assert(equal(r), [5, 6]);
//----

The advantage here is that:
1) consistent typing.
2) no direction bias.

If we had this, then find could just return the slice containing 
the found element, and user could work his way from there.

Problem is bloating the interface ... That, and we'd also need 
the versions that keep the cut element...
//----
bidir a = [1, 2, 3, 4, 5, 6];
bidir b = a.drop(2).dropBack(2); //[3, 4];
auto l = mergeBefore(a, b); //Gets the part of a that is before 
b, with b
auto r = mergeAfter(a, b); //Gets the part of a that is after b, 
with b
assert(equal(l), [1, 2, 3, 4]);
assert(equal(r), [3, 4, 5, 6]);
//----

It is a complicated problem, but one that is worth solving, IMO.