Thin UTF8 string wrapper

Joseph Rushton Wakeling joseph.wakeling at webdrake.net
Sat Dec 7 17:30:37 UTC 2019


On Saturday, 7 December 2019 at 15:57:14 UTC, Jonathan M Davis 
wrote:
> There may have been some tweaks to std.encoding here and there, 
> but for the most part, it's pretty ancient. Looking at the 
> history, it's Seb who marked some if it as being a replacement 
> for std.utf, which is just plain wrong.

Ouch!  I must say it was a surprise to read, precisely because 
std.encoding seemed weird and clunky.  Good to know that it's 
misleading.

Unfortunately that adds to the list I have of weirdly misleading 
docs that seem to have crept in over the last months/years :-(

> std.utf.validate does need a replacement, but doing so gets 
> pretty complicated. And looking at std.encoding.isValid, I'm 
> not sure that what it does is any better from simply wrapping 
> std.utf.validate and returning a bool based on whether an 
> exception was thrown.

Unfortunately I'm dealing with a use case where exception 
throwing (and indeed, anything that generates garbage) is 
preferred to be avoided.  That's why I was looking for a function 
that returned a bool ;-)

> Depending on the string, it would actually be faster to use 
> validate, because std.encoding.isValid iterates through the 
> entire string regardless. The way it checks validity is also 
> completely different from what std.utf does. Either way, some 
> of the std.encoding internals do seem to be an alternate 
> implementation of what std.utf has, but outside of std.encoding 
> itself, std.utf is what Phobos uses for UTF-8, UTF-16, and 
> UTF-32, not std.encoding.

Thanks -- good to know.

> I did do a PR at one point to add isValidUTF to std.utf so that 
> we could replace std.utf.validate, but Andrei didn't like the 
> implementation, so it didn't get merged, and I haven't gotten 
> around to figuring out how to implement it more cleanly.

Thanks for the attempt, at least!  While I get the reasons it was 
rejected, it feels a bit of a shame -- surely it's easier to do a 
more major under-the-hood rewrite with the public API (and tests) 
already in place ... :-\


More information about the Digitalmars-d-learn mailing list