A D vs. Rust example
Dukc
ajieskola at gmail.com
Fri Oct 28 09:07:35 UTC 2022
On Friday, 28 October 2022 at 04:27:25 UTC, Walter Bright wrote:
>
>>> A better approach is to have the string processing be
>>> tolerant of
>>> invalid UTF-8.
>>
>> Which makes string-processing code more fragile and possibly
>> more
>> complex.
>
> I've coded a lot of Phobos to be tolerant of invalid UTF-8. It
> turns out that it's *unusual* to need to decode UTF-8 at all.
> It's robust, not fragile.
Good point. But it could be easily solved by making the naturally
tolerant functions to accept `ubyte`s.
>
>
>> Better to let the standard library replace all invalid
>> sequences with the replacement character so that downstream
>> code doesn't
>> have to worry about it anymore.
>
> Then you have another processing step, and have to make a copy
> of the string. As I wrote, I have some experience with this.
> Being tolerant of invalid UTF-8 is a winning strategy.
Don't you remember? Ranges are lazy. No copy needed. And IIRC
Rust also has a lazy iterator over an unvalidated binary blob to
accomplish the same.
And it's not an extra step. If you don't validate a string, then
the string processing functions (that need to decode) have to do
that anyway.
The Rust way has the advantages that:
- No string handling function needs to throw anything. The could
all be `nothrow`.
- If two string handling functions that need to decode are
chained to each other, they don't need to both reduntantly check
for invalid UTF-8.
- You don't accidently forget to check for invalid UTF-8, or
recheck an already checked string.
The first two could also be accomplished by asserting on invalid
UTF-8 instead of throwing an exception, but only static
guarantees give the third advantage.
More information about the Digitalmars-d
mailing list