The Case Against Autodecode

Tue May 31 08:07:09 PDT 2016

On 5/31/16 3:56 AM, Walter Bright wrote:
> On 5/30/2016 9:16 PM, Andrei Alexandrescu wrote:
>> On 5/30/16 5:51 PM, Walter Bright wrote:
>>> On 5/30/2016 8:34 AM, Marc Schütz wrote:
>>>> In an ideal world, we'd also want to change the way `length` and
>>>> `opIndex` work,
>>>
>>> Why? strings are arrays of code units. All the trouble comes from
>>> erratically pretending otherwise.
>>
>> That's not an argument.
>
> Consistency is a factual argument, and autodecode is not consistent.

Consistency with what? Consistent with what?

>> Objects are arrays of bytes, or tuples of their fields,
>> etc. The whole point of encapsulation is superimposing a more
>> structured view on
>> top of the representation. Operating on open-heart representation is
>> risky, and
>> strings are no exception.
>
> If there is an abstraction for strings that is efficient, consistent,
> useful, and hides the fact that it is UTF, I am not aware of it.

It's been mentioned several times: a string type that does not offer 
range primitives; instead it offers explicit primitives (such as 
byCodeUnit, byCodePoint, byGrapheme etc) that yield appropriate ranges. 
-- Andrei