dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead
jfondren
julian.fondren at gmail.com
Sun Nov 7 01:59:47 UTC 2021
On Sunday, 7 November 2021 at 01:12:19 UTC, zjh wrote:
> On Thursday, 4 November 2021 at 08:24:59 UTC, zjh wrote:
>
> The `fundamental` problem is that we should provide users with
> `options` at compile time, not we `choose` for users.
> If you `choose` for users, there will always be dissatisfaction.
> You provide options ,and Users choose according to their needs.
>
> `auto decoding` and `utf8 string encoding` are both like this.
> If you choose for users, some people are always not happy.
d index with range checking: `arr[ind]`
d index without range checking: `arr.ptr[ind]`
c++ index with range checking: `arr.at(ind)`
c++ index without range checking: `arr[ind]`
There are two ways to index, and both D and C++ offer both ways.
Neither language removes a choice. If whether `arr[ind]` should
rangecheck were up for debate, what's for debate is what the
language should encourage by making that the default--the
option's more naturally expressed, that requires less typing.
The question here of "what should a foreach over the dchar of a
char[] do?" is the same kind of question.
default: `str`
throwing: `str.byUTF!(dchar, UseReplacementChar.no)`
asserting: `std.encoding.codePoints(str)`
replacement: `std.utf.byDchar(str)`
truncation: `str[0 .. std.encoding.validLength(str)]`
promotion: `std.string.representation(str)`
Put one of those inside `foreach (dchar; ...) { }` and you get
that handling of bad UTF. Changing the default doesn't make the
other options go away, and the default has to do *something*
(even a compile-time error of "this is not supported behavior" is
*something*), so you have to make a choice about the default and
make some users unhappy.
More information about the Digitalmars-d
mailing list