dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead

Fri Nov 5 06:30:02 UTC 2021

On Friday, 5 November 2021 at 00:38:59 UTC, Walter Bright wrote:
> On 11/3/2021 10:41 PM, FeepingCreature wrote:
>> On Thursday, 4 November 2021 at 05:34:29 UTC, FeepingCreature 
>> wrote:
>>> One may disagree about autodecoding; I for one think it's a 
>>> sensible idea. However, a program should either process data 
>>> correctly or, if that is impossible, not at all. It should 
>>> not, ever, silently modify it "for you" while reading! I 
>>> predict this will lead to cryptic, hair-pulling bugs in user 
>>> code involving replacement characters appearing far 
>>> downstream of the error site.
>
> Surprisingly, the reverse seems to be true. Suppose you're 
> writing a text editor. Then read a file with some bad UTF in 
> it. The editor dies with an exception. You can't even edit the 
> file to fix it.
>
> If you need to display user provided text, like in a browser, 
> or all sorts of tools, you don't want to die with an exception. 
> What are you going to do in an exception handler? You're just 
> going to replace the offending bytes with ReplacementChar and 
> go render it anyway.
>
>> (This is floating point NaN all over again!)
>
> Poor NaNs are terribly misunderstood.
>
> Suppose you have an array of sensors. One goes bad. The "bad" 
> value is 0.0. So now your data analyzer is happily averaging 
> 0.0 into the results, silently skewing them.
>
> Now, if a NaN is returned instead, your "average" will be NaN. 
> You know it's no good. It won't be hidden.
>
> Uninitialized variables are sensors giving bad data. Having a 
> NaN in your result is a *good* thing.

I think the program should crash in all these cases. The text 
editor should crash. The browser should crash. The analyzer 
should see a NaN, and crash.

These programs are *wrong.* They thought they could only get 
Unicode and they've gotten non-Unicode. So we know they're 
written on wrong assumptions; why do we want to continue running 
code we know is untrustworthy? Let them crash, let them be fixed 
to make fewer assumptions. Automagically handling errors by 
propagating them in an inert form robs the developers and users 
of a chance to avoid a mistake. It's no better than 0.0.