Request for "indexOf" undeprecation.

Sönke Ludwig sludwig at outerproduct.org
Fri Nov 16 08:33:27 PST 2012


Am 16.11.2012 16:08, schrieb monarch_dodra:
> On Friday, 16 November 2012 at 14:50:26 UTC, Sönke Ludwig wrote:
>> Am 16.11.2012 15:02, schrieb monarch_dodra:
>>>> wait, no, that one
>>>> also screws up and returns based on the ASCII state of the search character!
>>>
>>> It doesn't screw up the result, it is meant for slicing your string.
>>
>> Just that it returns the index of the _code point_ if you pass a non-ASCII character as the search
>> term and a byte index only if you pass in an ASCII char.
> 
> I'm not sure what you are trying to say? Aren't those the same?

Codepoint index would mean that it returns 1 instead of 3 in the example. But forget what I said. I
just tested it and was surprised that in 'foreach( i, dchar ch; str )', i contains byte offsets and
not code point indices - so the indexOf() implementation is fully correct after all.

> 
>>>> Btw. is 'string' actually considered a RA range? After all it provides no useful invariants apart
>>>> from str[0] == str.front - str[1] could be different from str.popFront(); str.front.
>>>
>>> No! String is not an RA range. It can be indexed, but isRandomAccessRange!string is false. This is a
>>> fundamental aspect of string, to avoid accidently breaking it.
>>>
>>> dstring, however, is random access. You should always take this into account for considering whether
>>> or not it is worth converting to before operating on it. For example: sorting the chars in a
>>> dstring: Easy as pie. Doing it on a string: Not sure if even possible.
>>
>> In that case I got confused by the example. I thought you wanted to make
>> "日本語".indexOf('本') == 3
>> possible again.
> 
> I did.
> 
>> But that wouldn't work if indexOf operates on RA ranges. If you do have a RA though,
>> how is the result of countUntil different from indexOf? If you have an actual char-range, countUntil
>> should also return 3...
> 
> One of the problems is that the semantics of a char range containing a utf payload is not very well
> formalized. In particular, there is no particular rule that states that popFront will pop an entire
> utf-sequence, or just a sigle char. In this situation, we can't really say what countUntil would
> return...
> 
> However, it is perfectly legal to decode a forward range, so I don't see why you wouldn't be able to
> explicitly search the index in a RA range, as opposed to the amount of popFronts needed to get
> there. It is two different operations, and they have been (IMO) erroneously merged together.
> 
> One of the workarounds is to "find" instead, and then calculate `r.length - r.find("本")`. But: 1.
> This is a pain to do. 2. Doesn't work on infinite ranges (which it should).
> 

As far as I see allowing divergent behavior for index based access and popFront/front would
basically mean that no sensible algorithm could be implemented. What should some generic algorithm
do with a RA range that returns double[] but yields byte values when using index access? But I guess
Andrei has some more specific ideas here.


More information about the Digitalmars-d mailing list