Narrow string is not a random access range

Wed Oct 24 04:39:50 PDT 2012

On 10/24/2012 01:07 PM, Jonathan M Davis wrote:
> On Wednesday, October 24, 2012 12:42:59 mist wrote:
>> On Tuesday, 23 October 2012 at 17:36:53 UTC, Simen Kjaeraas wrote:
>>> On 2012-10-23, 19:21, mist wrote:
>>>> Hm, and all phobos functions should operate on narrow strings
>>>> as if they where not random-acessible? I am thinking about
>>>> something like commonPrefix from std.algorithm, which operates
>>>> on code points for strings.
>>>
>>> Preferably, yes. If there are performance (or other) benefits
>>> from
>>> operating on code units, and it's just as safe, then operating
>>> on code
>>> units is ok.
>>
>> Probably I don't undertsand it fully, but D approach has always
>> been "safe first, fast with some additional syntax". Back to
>> commonPrefix and take:
>>
>> ==========================
>> import std.stdio, std.traits, std.algorithm, std.range;
>>
>> void main()
>> {
>> 	auto beer = "Пиво";
>> 	auto r1 = beer.take(2);
>> 	auto pony = "Пони";
>> 	auto r2 = commonPrefix(beer, pony);
>> 	writeln(r1);
>> 	writeln(r2);
>> }
>> ==========================
>>
>> First one returns 2 symbols. Second one - 3 code points and
>> broken string. There is no way such incosistency by-default in
>> standard library is understandable by a newbie.
>
> We don't really have much choice here. As long as strings are arrays of code
> units, it wouldn't work to treat them as ranges of their elements, because
> that would be a complete disaster for unicode. You'd be operating on code
> units rather than code points, which is almost always wrong.

There are plenty cases where it makes no difference, or iterating by
code point is harmful, or just as incorrect.

str.filter!(a=>a!='x'); // works for all str iterated by
                         // code point or by code unit

string x = str.filter!(a=>a!='x').array;// only works in the latter case

dstring s = "ÅA";
dstring g = s.filter!(a=>a!='A').array;

> Pretty much the
> only way to really solve the problem as long as strings are arrays with all of
> the normal array operations is for the std.range traits (hasLength,
> hasSlicing, etc.) and the range functions for arrays in std.array (e.g. front,
> popFront, etc.) to treat strings as ranges of code points (dchar), which is
> what they do. The result _is_ confusing, but as long as strings are arrays of
> code units like they are now, to do anything else would result in incorrect
> behavior.

It would result in by-code-unit behavior.

> There just isn't a good solution given what strings currently are in
> the language itself.
>
> Andrei's suggestion would work if Walter could be talked into it, but that
> doesn't look like it's going to happen. And making it so that strings are
> structs which hold arrays of code units could work, but without language
> support, it's likely to have major issues. String literals would have to
> become the struct type, which could cause issue with calling C functions, and
> the code breakage would be _way_ larger than with Andrei's suggestion, since
> arrays of code units would no longer be strings at all.
 > ...

You realize that the proposed solution is that arrays of code units
would no longer be arrays of code units?