dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead
FeepingCreature
feepingcreature at gmail.com
Mon Nov 8 08:18:51 UTC 2021
On Monday, 8 November 2021 at 08:11:12 UTC, FeepingCreature wrote:
> On Sunday, 7 November 2021 at 04:18:25 UTC, Walter Bright wrote:
>> It's much better than 0.0. 0.0 is indistinguishable from valid
>> data, and is a very common valid value.
>>
>> NaN and ReplacementChar are not valid and are easily
>> distinguished.
>
> No, that's exactly the problem. ReplacementChar is not easily
> distinguished, because it's a valid Unicode character - that's
> the whole point of it. So just like nan, it can propagate
> arbitrarily far through your processing pipeline before some
> downstream process decides that it actually doesn't like it.
Sorry, let me expand on this because I think it's the very core
of the disagreement.
I feel you have two options with NaN/ReplacementChar. You can
either just accept that this is what you get, and let it
propagate throughout your entire pipeline. In that case it's no
better than 0.0 - actually, NaN would be *worse*, because your
process would be completely broken with no way to fix it, whereas
at least with 0.0 you can maybe get some reasonably-usable data
out.
Or you can say that "we don't want to be generating
NaN/ReplacementChar." Then where do you draw the line? At the
process input/output boundary? But then the process needs to be
fixed if it generates nans/fffds. So you want to move your
signaling as close to the production site as possible.
Preferably, you want to fail at the exact line that the
problematic data was produced. So we're back at exceptions in
foreach. (Actually, an exception in cast(string) would be the
best.)
And that's why I think ReplacementChar/NaN are no better than
0.0. You either embrace them fully as "valid" data, or you handle
them at the site of origin; any compromise just makes you worse
off than either extreme.
More information about the Digitalmars-d
mailing list