dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead
FeepingCreature
feepingcreature at gmail.com
Mon Nov 8 08:11:12 UTC 2021
On Sunday, 7 November 2021 at 04:18:25 UTC, Walter Bright wrote:
> It's much better than 0.0. 0.0 is indistinguishable from valid
> data, and is a very common valid value.
>
> NaN and ReplacementChar are not valid and are easily
> distinguished.
No, that's exactly the problem. ReplacementChar is not easily
distinguished, because it's a valid Unicode character - that's
the whole point of it. So just like nan, it can propagate
arbitrarily far through your processing pipeline before some
downstream process decides that it actually doesn't like it. And
at that point you generally have no chance to recover the source
of the issue - you know that something maybe has gone wrong, but
you don't even know if it was in your process or in the input
data. After all, if you were screening your input data for
ReplacementChar, you could as easily have been screening it for
invalid UTF-8 to begin with. So while yes it's marginally better
than 0.0, because at least you know that *something* is wrong, it
does as little as possible to help you locate the problem while
technically informing you. And all the workarounds for that take
the form of "throw everywhere where a ReplacementChar could be
generated." So imo just do the equivalent of turning on
FE_INVALID, and do that to begin with. There's no point to
getting rid of throw sites when you just force the user to readd
them manually because they fulfill a genuine need.
IMO if you want to get rid of the exception overhead, I'd go the
other way and make invalid unicode an abort(). Check your input
data, people.
More information about the Digitalmars-d
mailing list