std.algorithm.splitter on a string not always bidirectional
Jon Degenhardt
jond at noreply.com
Fri Jan 22 16:57:38 UTC 2021
On Friday, 22 January 2021 at 14:14:50 UTC, Steven Schveighoffer
wrote:
> On 1/22/21 12:55 AM, Jon Degenhardt wrote:
>> On Friday, 22 January 2021 at 05:51:38 UTC, Jon Degenhardt
>> wrote:
>>> On Thursday, 21 January 2021 at 22:43:37 UTC, Steven
>>> Schveighoffer wrote:
>>>> auto sp1 = "a|b|c".splitter('|');
>>>>
>>>> writeln(sp1.back); // ok
>>>>
>>>> auto sp2 = "a.b|c".splitter!(v => !isAlphaNum(v));
>>>>
>>>> writeln(sp2.back); // error, not bidirectional
>>>>
>>>> Why? is it an oversight, or is there a good reason for it?
>>>>
>>>
>>> I believe the reason is two-fold. First, splitter is lazy.
>>> Second, the range splitting is defined in the forward
>>> direction, not the reverse direction. A bidirectional range
>>> is only supported if it is guaranteed that the splits will
>>> occur at the same points in the range when run in either
>>> direction. That's why the single element delimiter is
>>> supported. Its clearly the case for the predicate function in
>>> your example. If that's known to be always true then perhaps
>>> it would make sense to enhance splitter to generate
>>> bidirectional results in this case.
>>>
>>
>> Note that the predicate might use a random number generator to
>> pick the split points. Even for same sequence of random
>> numbers, the split points would be different if run from the
>> front than if run from the back.
>
> I think this isn't a good explanation.
>
> All forms of splitter accept a predicate (including the one
> which supports a bi-directional result). Many other phobos
> algorithms that accept a predicate provide bidirectional
> support. The splitter result is also a forward range (which
> makes no sense in the context of random splits).
>
> Finally, I'd suggest that even if you split based on a subrange
> that is also bidirectional, it doesn't make sense that you
> couldn't split backwards based on that. Common sense says a
> range split on substrings is the same whether you split it
> forwards or backwards.
>
> I can do this too (and in fact I will, because it works, even
> though it's horrifically ugly):
>
> auto sp3 = "a.b|c".splitter!((c, unused) =>
> !isAlphaNum(c))('?');
>
> writeln(sp3.back); // ok
>
> Looking at the code, it looks like the first form of spltter
> uses a different result struct than the other two (which have a
> common implementation). It just needs cleanup.
>
> -Steve
I think the idea is that if a construct like 'xyz.splitter(args)'
produces a range with the sequence of elements {"a", "bc",
"def"}, then 'xyz.splitter(args).back' should produce "def". But,
if finding the split points starting from the back results in
something like {"f", "de", "abc"} then that relationship hasn't
held, and the results are unexpected.
Note that in the above example, 'xyz.retro.splitter(args)' might
produce {"f", "ed", "cba"}, so again not the same.
Another way to look at it: If split (eager) took a predicate,
that 'xyz.splitter(args).back' and 'xyz.split(args).back' should
produce the same result. But they will not with the example given.
I believe these consistency issues are the reason why the
bidirectional support is limited.
Note: I didn't design any of this, but I did redo the examples in
the documentation at one point, which is why I looked at this.
--Jon
More information about the Digitalmars-d-learn
mailing list