List of Phobos functions that allocate memory?

Dmitry Olshansky dmitry.olsh at gmail.com
Tue Feb 18 00:14:58 PST 2014


17-Feb-2014 06:19, Marco Leise пишет:
> Am Sun, 09 Feb 2014 12:18:41 +0400
> schrieb Dmitry Olshansky <dmitry.olsh at gmail.com>:
>
>> 09-Feb-2014 09:35, Marco Leise пишет:
>>> Thats neither an improvement over calling "validate" nor does
>>> that deal with distinguishing between invalid UTF and
>>
>> Means text is broken but wasn't ever read...
>>> \uFFFD
>>> in the input.
>> ...means text was broken sometime before.
>>
>> Hardly makes any difference to the most applications.
>> Normal text doesn't contain \uFFFD.
>
> Of course it does. It is a valid symbol and a lot of websites
> describing the "Specials" Unicode block make use of it, like
> the one on Wikipedia:
> http://en.wikipedia.org/wiki/Specials_(Unicode_block)
>
> With your definition, pulling such a document from the web and
> parsing it in D would mean playing on broken strings.

In a sense, \uFFFD means broken encoding. What about lone surrogates? 
Private use symbols that must not occur in transmission? They all 
displayed in various Unicode listings. About 'playing on broken strings' 
- ignoring broken/partially broken strings, I specifically think that 
it's what most users/use cases want.

A more useful and sensible default of decoding is to substitute on 
broken encoding. And it's a standard procedure. It's particularly better 
for displaying text.

To remind: since it's only a decode you are still in the control of 
original text - in fact you may re-test what bytes are there IF you want.

The way of "throw on bad encoding" could be useful but I hardly see it 
as what you want for default.

I'm wary of breaking code that relies on throwing. For the moment I 
think the best course of action would be to introduce xdecode or some 
such that will do substitution on failure, see how it floats and then 
change ranges/foreach etc to use xdecode.

>>>> [...]
>>>> Every single text editor out there seems to disagree with you: they do
>>>> show you partially substituted text, not a dialog box "My bad, it's
>>>> broken UTF-8, I'm giving up!".
>
>>> gedit does in fact throw an error message at you
>>> saying "My bad, it's broken UTF-8, I'm giving up!".
>
>> I know and it's piece of junk :)
>> Seriously it doesn't even has regular expressions for search and replace!
>
> https://yourlogicalfallacyis.com/no-true-scotsman :p

Well, gedit is a nice example of why just throwing exception is not good 
enough for many apps (editors in particular). The fact that it's piece 
of junk might be irrelevant ;)

-- 
Dmitry Olshansky


More information about the Digitalmars-d mailing list