A D vs. Rust example

Fri Oct 28 09:07:35 UTC 2022

On Friday, 28 October 2022 at 04:27:25 UTC, Walter Bright wrote:
>
>>> A better approach is to have the string processing be 
>>> tolerant of
>>> invalid UTF-8.
>> 
>> Which makes string-processing code more fragile and possibly 
>> more
>> complex.
>
> I've coded a lot of Phobos to be tolerant of invalid UTF-8. It 
> turns out that it's *unusual* to need to decode UTF-8 at all. 
> It's robust, not fragile.

Good point. But it could be easily solved by making the naturally 
tolerant functions to accept `ubyte`s.

>
>
>> Better to let the standard library replace all invalid
>> sequences with the replacement character so that downstream 
>> code doesn't
>> have to worry about it anymore.
>
> Then you have another processing step, and have to make a copy 
> of the string. As I wrote, I have some experience with this. 
> Being tolerant of invalid UTF-8 is a winning strategy.

Don't you remember? Ranges are lazy. No copy needed. And IIRC 
Rust also has a lazy iterator over an unvalidated binary blob to 
accomplish the same.

And it's not an extra step. If you don't validate a string, then 
the string processing functions (that need to decode) have to do 
that anyway.

The Rust way has the advantages that:

  - No string handling function needs to throw anything. The could 
all be `nothrow`.
  - If two string handling functions that need to decode are 
chained to each other, they don't need to both reduntantly check 
for invalid UTF-8.
  - You don't accidently forget to check for invalid UTF-8, or 
recheck an already checked string.

The first two could also be accomplished by asserting on invalid 
UTF-8 instead of throwing an exception, but only static 
guarantees give the third advantage.