A D vs. Rust example

H. S. Teoh hsteoh at qfbox.info
Thu Oct 27 23:55:21 UTC 2022


On Thu, Oct 27, 2022 at 04:37:12PM -0700, Walter Bright via Digitalmars-d wrote:
> On 10/24/2022 1:04 PM, Dukc wrote:
> > it's UTF-8 string type. Not only it is guaranteed to point to valid
> > memory, it is statically guaranteed to point to valid UTF-8!
> The trouble with that is much of the UTF-8 out there is not valid. You
> don't want, for example, your html page to refuse to display at all
> because there's a couple invalid UTF-8 sequences in it. You don't want
> your text editor to refuse to load a file with invalid UTF-8 in it,
> either. You don't want your forms processor to summarily reject
> anything with invalid UTF-8 in it.

You don't have to refuse anything.  Just substitute it with the Unicode
replacement character in your standard library, and no downstream code
will need to worry about it anymore.

And should you ever need to process invalid sequences (e.g., in a
utility to repair broken encodings), just read it as binary and process
it that way.


> A better approach is to have the string processing be tolerant of
> invalid UTF-8.

Which makes string-processing code more fragile and possibly more
complex. Better to let the standard library replace all invalid
sequences with the replacement character so that downstream code doesn't
have to worry about it anymore.


T

-- 
Doubtless it is a good thing to have an open mind, but a truly open mind should be open at both ends, like the food-pipe, with the capacity for excretion as well as absorption. -- Northrop Frye


More information about the Digitalmars-d mailing list