List of Phobos functions that allocate memory?

Marco Leise Marco.Leise at gmx.de
Tue Feb 18 10:16:14 PST 2014


Am Tue, 18 Feb 2014 12:14:58 +0400
schrieb Dmitry Olshansky <dmitry.olsh at gmail.com>:

> In a sense, \uFFFD means broken encoding.

In a sense yes, in another no. It is a defined code point and
it has a symbol: � a diamond with a question mark inside.

> What about lone surrogates?

Those are actual broken encoding.

> Private use symbols that must not occur in transmission?

Then that "transmission" seems to exclude private symbols. It
may also exclude special characters like \uFFFD. That's part
of the particular protocol and should be handled there.

> They all 
> displayed in various Unicode listings. About 'playing on broken strings' 
> - ignoring broken/partially broken strings, I specifically think that 
> it's what most users/use cases want.
> 
> A more useful and sensible default of decoding is to substitute on 
> broken encoding. And it's a standard procedure. It's particularly better 
> for displaying text.

Correct. I just don't agree that displaying text should the
the one true use case and instead prefer exceptions instead of
silent loss of information as the default.

> To remind: since it's only a decode you are still in the control of 
> original text - in fact you may re-test what bytes are there IF you want.
> 
> The way of "throw on bad encoding" could be useful but I hardly see it 
> as what you want for default.
> 
> I'm wary of breaking code that relies on throwing. For the moment I 
> think the best course of action would be to introduce xdecode or some 
> such that will do substitution on failure, see how it floats and then 
> change ranges/foreach etc to use xdecode.

We wont convince each other. Let's just stop here.

-- 
Marco



More information about the Digitalmars-d mailing list