dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead

Mathias LANG geod24 at gmail.com
Fri Nov 5 02:41:46 UTC 2021


On Friday, 5 November 2021 at 00:38:59 UTC, Walter Bright wrote:
>
> Surprisingly, the reverse seems to be true. Suppose you're 
> writing a text editor. Then read a file with some bad UTF in 
> it. The editor dies with an exception. You can't even edit the 
> file to fix it.
>
> If you need to display user provided text, like in a browser, 
> or all sorts of tools, you don't want to die with an exception. 
> What are you going to do in an exception handler? You're just 
> going to replace the offending bytes with ReplacementChar and 
> go render it anyway.

If you handle user input, you take it as `ubyte[]` and validate 
it.
Any decent editor will try to detect the encoding instead of 
blindly assuming UTF-8.

If you want to fix it, just deprecate the special case and tell 
people to use `foreach (dchar d; someString.byUTF!(dchar, 
No.useReplacementDchar))` and voilà. And if they don't want it to 
throw, it's shorter:
`foreach (dchar d; someString.byUTF!dchar)` (or `byDChar`).


More information about the Digitalmars-d mailing list