dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead

Mon Nov 8 12:32:08 UTC 2021

On Monday, 8 November 2021 at 12:02:12 UTC, Ola Fosheim Grøstad 
wrote:
> It is very difficult to follow your line of reasoning, because 
> ReplacementChar is nothing like qNaN, it is more like sNaN. 
> ReplacementChar is not the result of an approximation failure, 
> it is corruption of the input (or maybe a foreign encoding).
>
> Getting a 0.0 instead of qNaN in a signal is absolutely 
> disastrous. Walter is 100% right on that one. 0.0 will 
> introduce a peak across the frequency range. qNan can be 
> removed with no distortion.
>
> Should you express your types strongly? Yes, but then you also 
> should include things like negative numbers, denormal numbers, 
> ±infity, ranges [1.0-0.0] and so on.

Yeah I noticed this after I clicked post, but I didn't want to 
add a third comment. I think the difference is fundamentally one 
of "time-series vs progressive data". I don't think that's the 
right word, but I don't know a better one. Like, if you have a 
measuring series of values interspersed with nans, you can know 
for instance that the values are assigned to times, or to 
positions, and then you can semantically decide what to do with 
the data. For instance you may mark the nans with an error, or 
drop them and interpolate. However, it is much harder to see 
where such a behavior would be useful for ReplacementCharacter. 
Generally, you're reading data that someone wrote for a reason, 
and ReplacementCharacter would almost universally indicate that 
there was something you were meant to pick up on but failed to 
handle. As such, it's much less clear to me whether there even 
are cases where "text with replacement characters" or "text with 
replacement characters removed" is even useful.