A D vs. Rust example
Walter Bright
newshound2 at digitalmars.com
Fri Oct 28 04:27:25 UTC 2022
On 10/27/2022 4:55 PM, H. S. Teoh wrote:
> You don't have to refuse anything. Just substitute it with the Unicode
> replacement character in your standard library, and no downstream code
> will need to worry about it anymore.
That's one way to deal with it. But until it is so processed, it isn't a string
if the string requires strict UTF-8.
> And should you ever need to process invalid sequences (e.g., in a
> utility to repair broken encodings), just read it as binary and process
> it that way.
Yes, but you can't do it with strings, if strings don't allow invalid sequences.
>> A better approach is to have the string processing be tolerant of
>> invalid UTF-8.
>
> Which makes string-processing code more fragile and possibly more
> complex.
I've coded a lot of Phobos to be tolerant of invalid UTF-8. It turns out that
it's *unusual* to need to decode UTF-8 at all. It's robust, not fragile.
> Better to let the standard library replace all invalid
> sequences with the replacement character so that downstream code doesn't
> have to worry about it anymore.
Then you have another processing step, and have to make a copy of the string. As
I wrote, I have some experience with this. Being tolerant of invalid UTF-8 is a
winning strategy.
More information about the Digitalmars-d
mailing list