dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead
Mathias LANG
geod24 at gmail.com
Fri Nov 5 02:41:46 UTC 2021
On Friday, 5 November 2021 at 00:38:59 UTC, Walter Bright wrote:
>
> Surprisingly, the reverse seems to be true. Suppose you're
> writing a text editor. Then read a file with some bad UTF in
> it. The editor dies with an exception. You can't even edit the
> file to fix it.
>
> If you need to display user provided text, like in a browser,
> or all sorts of tools, you don't want to die with an exception.
> What are you going to do in an exception handler? You're just
> going to replace the offending bytes with ReplacementChar and
> go render it anyway.
If you handle user input, you take it as `ubyte[]` and validate
it.
Any decent editor will try to detect the encoding instead of
blindly assuming UTF-8.
If you want to fix it, just deprecate the special case and tell
people to use `foreach (dchar d; someString.byUTF!(dchar,
No.useReplacementDchar))` and voilà. And if they don't want it to
throw, it's shorter:
`foreach (dchar d; someString.byUTF!dchar)` (or `byDChar`).
More information about the Digitalmars-d
mailing list