The Case Against Autodecode

Sat May 28 15:29:12 PDT 2016

On Saturday, 28 May 2016 at 19:04:14 UTC, Walter Bright wrote:
> On 5/28/2016 5:04 AM, Andrei Alexandrescu wrote:
>> So it harkens back to the original mistake: strings should NOT 
>> be arrays with
>> the respective primitives.
>
> An array of code units provides consistency, predictability, 
> flexibility, and performance. It's a solid base upon which the 
> programmer can build what he needs as required.
>
> A string class does not do that (from the article: "I admit the 
> correct answer is not always clear").

You're right. An "array of code units" is a very useful low-level 
primitive. I've dealt with a lot of code that uses these (more or 
less correctly) in various languages.

But when providing such a thing, I think it's very important to 
make it *look* like a low-level primitive, and use the type 
system to distinguish it from higher-level ones.

E.g. A string literal should not implicitly convert into an array 
of code units. What should it implicitly convert to? I'm not 
sure. Something close to how it looks in the source code, 
probably. A sequential range of graphemes? From all the detail in 
this thread, I wonder now if "a grapheme" is even an unambiguous 
concept across different environments. But one thing I'm sure of 
(and this is from other languages/API's, not from D 
specifically): A function which converts from one representation 
to another, but doesn't keep track of the change (e.g. Different 
compile-time type; e.g. State in a "string" class about whether 
it is in normalized form), is a "bug farm".