dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead

jfondren julian.fondren at gmail.com
Sun Nov 7 01:59:47 UTC 2021


On Sunday, 7 November 2021 at 01:12:19 UTC, zjh wrote:
> On Thursday, 4 November 2021 at 08:24:59 UTC, zjh wrote:
>
> The `fundamental` problem is that we should provide users with 
> `options` at compile time, not we `choose` for users.
> If you `choose` for users, there will always be dissatisfaction.
> You provide options ,and Users choose according to their needs.
>
> `auto decoding` and `utf8 string encoding` are both like this. 
> If you choose for users, some people are always not happy.

d index with range checking: `arr[ind]`
d index without range checking: `arr.ptr[ind]`

c++ index with range checking: `arr.at(ind)`
c++ index without range checking: `arr[ind]`

There are two ways to index, and both D and C++ offer both ways. 
Neither language removes a choice. If whether `arr[ind]` should 
rangecheck were up for debate, what's for debate is what the 
language should encourage by making that the default--the 
option's more naturally expressed, that requires less typing.

The question here of "what should a foreach over the dchar of a 
char[] do?" is the same kind of question.

default: `str`
throwing: `str.byUTF!(dchar, UseReplacementChar.no)`
asserting: `std.encoding.codePoints(str)`
replacement: `std.utf.byDchar(str)`
truncation: `str[0 .. std.encoding.validLength(str)]`
promotion: `std.string.representation(str)`

Put one of those inside `foreach (dchar; ...) { }` and you get 
that handling of bad UTF. Changing the default doesn't make the 
other options go away, and the default has to do *something* 
(even a compile-time error of "this is not supported behavior" is 
*something*), so you have to make a choice about the default and 
make some users unhappy.


More information about the Digitalmars-d mailing list