Range of chars (narrow string ranges)

Fri Apr 24 16:33:54 PDT 2015

On Friday, 24 April 2015 at 20:44:34 UTC, Walter Bright wrote:
> On 4/24/2015 11:52 AM, H. S. Teoh via Digitalmars-d wrote:
>> I really wish we would just *make the darn decision* already, 
>> whether to
>> kill off autodecoding or not, and MAKE IT CONSISTENT ACROSS 
>> PHOBOS,
>> instead of introducing this schizophrenic dichotomy where some 
>> functions
>> give you a range of dchar while others give you a range of 
>> char/wchar,
>> and the two don't work well together. This is totally going to 
>> make a
>> laughing stock of D one day.
>
> Some facts:
>
> 1. When I started D, there was a lot of speculation about 
> whether the world would settle on UTF8, UTF16, or UTF32. So D 
> supports natively all three. Time has shown, however, that UTF8 
> has pretty much won. wchar only exists for Windows API and 
> Java, dchar strings pretty much don't exist in the wild.
>
> 2. dchar is very useful as a character type, but not as a 
> string type.
>
> 3. Pretty much none of the algorithms in Phobos work when 
> presented with a range of chars or wchars. This is not even 
> documented.
>
> 4. Autodecoding is inefficient, especially considering that few 
> algorithms actually need decoding. Re-encoding the result back 
> to UTF8 is another inefficiency.
>
> I'm afraid we are stuck with autodecoding, as taking it out may 
> be far too disruptive.
>
> But all is not lost. The Phobos algorithms can all be fixed to 
> not care about autodecoding. The changes I've made to 
> std.string all reflect that.
>
> https://github.com/D-Programming-Language/phobos/pulls/WalterBright

I really think that leaving things with autodecoding in some 
cases and not in others is just asking for trouble. Even if we 
manage to figure out how to fix it so that Phobos doesn't 
autodecode in any of its algorithms without breaking any user 
code in the process, that then leaves user code with the problem, 
and since Phobos _wouldn't_ have the problem, it then would be 
all the more confusing.

It _is_ possible to get rid of it entirely without breaking code 
if we move the array range primitives to a new module and later 
deprecate the old ones, though that would probably mean breaking 
up std.array into submodules and deprecating _all_ of it in favor 
of its submodules, since anyone importing std.array would then 
have the old array range primitives rather than the new ones - or 
both, causing conflicts. And it's made worse by the fact that 
std.range publicly imports std.array. So, yes, it _is_ ugly. But 
it _can_ be done.

If we leave autodecoding in and just work around it everywhere in 
Phobos, it's just going to forever screw with user code and 
confuse users. They get confused enough by it as it is, and at 
least now, they're running into it in Phobos where we can explain 
it, whereas if they don't see it with Phobos and only with their 
own code, then they're going to think that they're doing 
something wrong and potentially get very frustrated.

I definitely share the concern that removing autodecoding 
outright will be too disruptive, but at the same time, I don't 
know if we can afford to go halfway with it.