List of Phobos functions that allocate memory?

Dmitry Olshansky dmitry.olsh at gmail.com
Sat Feb 8 03:27:26 PST 2014


08-Feb-2014 09:45, Jonathan M Davis пишет:
> On Friday, February 07, 2014 21:04:08 Jonathan M Davis wrote:
> Actually, thinking this through some more, if we can replace invalid Unicode
> with 0xFFFD, and have all algorithms work with that and consider it valid
> Unicode (rather than getting weird bugs due to invalid Unicode), then if
> decode returned that on error rather than throwing, we wouldn't actually need
> to check the return value. It wouldn't matter that the Unicode was invalid.
> So, we wouldn't even need to _care_ that the Unicode was invalid. Anyone who
> _did_ care could call isValidUnicode to validate the Unicode first, and those
> who didn't wouldn't need to worry about UTFException being thrown, because
> everything would still work even if the string was invalid Unicode.

Hm.. yes. I gotta read the whole thread next time :)


> So, if that's indeed what 0xFFFD does, and that's what Dmitry meant by
> proposing that we return that rather than throwing, then I rescind my
> assessment that throwing was the best way to go and have to agree that
> returning 0xFFFD would be better. I was responding under the assumption that
> you had to check for 0xFFFD and respond to it order to avoid having your code
> be buggy, in which case throwing would be far better. But if 0xFFFD is
> considered valid Unicode,

It is.

> then returning that would be a fantastic solution.
> And if that's the case, we only need two functions, not three:
>
> 1. decode, which returns 0xFFFD on decode failure
>
> 2. isValidUnicode, which returns whether the string is valid
>

Yay.

> And I actually really like the idea that we could just operate on invalid
> Unicode as valid Unicode this way, making it so that most code doesn't need to
> care, and code that _does_ need to care, can validate the strings first. Right
> now, pretty much all string code needs to care in order to avoid processing
> invalid Unicode, which is much messier.
>
Horray! The goodness is that for example I can run regex on partially 
broken text and have some sane results out of it.

> - Jonathan M Davis
>


-- 
Dmitry Olshansky


More information about the Digitalmars-d mailing list