std.algorithm.splitter on a string not always bidirectional

Fri Jan 22 16:57:38 UTC 2021

On Friday, 22 January 2021 at 14:14:50 UTC, Steven Schveighoffer 
wrote:
> On 1/22/21 12:55 AM, Jon Degenhardt wrote:
>> On Friday, 22 January 2021 at 05:51:38 UTC, Jon Degenhardt 
>> wrote:
>>> On Thursday, 21 January 2021 at 22:43:37 UTC, Steven 
>>> Schveighoffer wrote:
>>>> auto sp1 = "a|b|c".splitter('|');
>>>>
>>>> writeln(sp1.back); // ok
>>>>
>>>> auto sp2 = "a.b|c".splitter!(v => !isAlphaNum(v));
>>>>
>>>> writeln(sp2.back); // error, not bidirectional
>>>>
>>>> Why? is it an oversight, or is there a good reason for it?
>>>>
>>>
>>> I believe the reason is two-fold. First, splitter is lazy. 
>>> Second, the range splitting is defined in the forward 
>>> direction, not the reverse direction. A bidirectional range 
>>> is only supported if it is guaranteed that the splits will 
>>> occur at the same points in the range when run in either 
>>> direction. That's why the single element delimiter is 
>>> supported. Its clearly the case for the predicate function in 
>>> your example. If that's known to be always true then perhaps 
>>> it would make sense to enhance splitter to generate 
>>> bidirectional results in this case.
>>>
>> 
>> Note that the predicate might use a random number generator to 
>> pick the split points. Even for same sequence of random 
>> numbers, the split points would be different if run from the 
>> front than if run from the back.
>
> I think this isn't a good explanation.
>
> All forms of splitter accept a predicate (including the one 
> which supports a bi-directional result). Many other phobos 
> algorithms that accept a predicate provide bidirectional 
> support. The splitter result is also a forward range (which 
> makes no sense in the context of random splits).
>
> Finally, I'd suggest that even if you split based on a subrange 
> that is also bidirectional, it doesn't make sense that you 
> couldn't split backwards based on that. Common sense says a 
> range split on substrings is the same whether you split it 
> forwards or backwards.
>
> I can do this too (and in fact I will, because it works, even 
> though it's horrifically ugly):
>
> auto sp3 = "a.b|c".splitter!((c, unused) => 
> !isAlphaNum(c))('?');
>
> writeln(sp3.back); // ok
>
> Looking at the code, it looks like the first form of spltter 
> uses a different result struct than the other two (which have a 
> common implementation). It just needs cleanup.
>
> -Steve

I think the idea is that if a construct like 'xyz.splitter(args)' 
produces a range with the sequence of elements {"a", "bc", 
"def"}, then 'xyz.splitter(args).back' should produce "def". But, 
if finding the split points starting from the back results in 
something like {"f", "de", "abc"} then that relationship hasn't 
held, and the results are unexpected.

Note that in the above example, 'xyz.retro.splitter(args)' might 
produce {"f", "ed", "cba"}, so again not the same.

Another way to look at it: If split (eager) took a predicate, 
that 'xyz.splitter(args).back' and 'xyz.split(args).back' should 
produce the same result. But they will not with the example given.

I believe these consistency issues are the reason why the 
bidirectional support is limited.

Note: I didn't design any of this, but I did redo the examples in 
the documentation at one point, which is why I looked at this.

--Jon