std.algorithm.splitter on a string not always bidirectional

Steven Schveighoffer schveiguy at gmail.com
Sat Jan 23 15:07:27 UTC 2021


On 1/22/21 2:13 PM, Jon Degenhardt wrote:
> On Friday, 22 January 2021 at 17:29:08 UTC, Steven Schveighoffer wrote:
>> On 1/22/21 11:57 AM, Jon Degenhardt wrote:
>>>
>>> I think the idea is that if a construct like 'xyz.splitter(args)' 
>>> produces a range with the sequence of elements {"a", "bc", "def"}, 
>>> then 'xyz.splitter(args).back' should produce "def". But, if finding 
>>> the split points starting from the back results in something like 
>>> {"f", "de", "abc"} then that relationship hasn't held, and the 
>>> results are unexpected.
>>
>> But that is possible with all 3 splitter variants. Why is one allowed 
>> to be bidirectional and the others are not?
> 
> I'm not defending it, just explaining what I believe the thinking was 
> based on the examination I did. It wasn't just looking at the code, 
> there was a discussion somewhere. A forum discussion, PR discussion, bug 
> or code comments. Something somewhere, but I don't remember exactly.
> 
> However, to answer your question - The relationship described is 
> guaranteed if the basis for the split is a single element. If the range 
> is a string, that's a single 'char'. If the range is composed of 
> integers, then a single integer. Note that if the basis for the split is 
> itself a range, then the relationship described is not guaranteed.
> 
> Personally, I can see a good argument that bidirectionality should not 
> be supported in any of these cases, and instead force the user to choose 
> between eager splitting or reversing the range via retro. For the common 
> case of strings, the further argument could be made that the distinction 
> between char and dchar is another point of inconsistency.

I would not want that. My use case is splitting a string on punctuation, 
and using the lazy result for testing equality of something. But I have 
some special suffix items that I want to handle first (and pop off).

dchar/char inconsistency isn't a problem, because they are both dchar 
ranges (and both are bidirectional).

> Regardless whether the choices made were the best choices, there was 
> some thinking that went into it, and it is worth understanding the 
> thinking when considering changes.

I believe there was that thinking. It's why I posted, because before I 
filed a bug, I wanted to make sure there wasn't a good reason.

It looks like there is NOT a good reason for the single-item based 
splitting as you say to prevent bidirectional access. But there IS a 
good reason (thanks for the example H.S. Teoh) to prevent it for 
multi-element delimiters.

-Steve


More information about the Digitalmars-d-learn mailing list