Why the hell doesn't foreach decode strings

Fri Oct 21 11:11:14 PDT 2011

On 21/10/11 3:26 AM, Walter Bright wrote:
> On 10/20/2011 2:49 PM, Peter Alexander wrote:
>> The whole mess is caused by conflating the idea of an array with a
>> variable
>> length encoding that happens to use an array for storage. I don't
>> believe there
>> is any clean and tidy way to fix the problem without breaking
>> compatibility.
>
> There is no 'fixing' it, even to break compatibility. Sometimes you want
> to look at an array of utf8 as 8 bit characters, and sometimes as 20 bit
> dchars. Someone will be dissatisfied no matter what.

Then separate those ways of viewing strings.

Here's one solution that I believe would satisfy everyone:

1. Remove the string, wstring and dstring aliases. An array of char 
should be an array of char, i.e. the same as array of byte. Same for 
arrays of wchar and dchar. This way, arrays of T have no subtle 
differences for certain kinds of T.

2. Add string, wstring and dstring structs with the following interface:

  a. foreach should iterate as dchar.
  b. @property front() would be dchar.
  c. @property length() would not exist.
  d. @property buffer() returns the underlying immutable array of char, 
wchar etc.
  e. Remove opIndex and co.

What this does:
- Makes all array types consistent and intuitive.
- Makes looping over strings do the expected thing.
- Provides an interface to the underlying 8-bit chars for those that 
want it.

Of course, people will still need to understand UTF-8. I don't think 
that's a problem. It's unreasonable to expect the language to do the 
thinking for you. The problem is that we have people that *do* 
understand UTF-8 (like the OP), but *don't* understand D's strings.