List of Phobos functions that allocate memory?

Dmitry Olshansky dmitry.olsh at gmail.com
Fri Feb 7 12:14:04 PST 2014


07-Feb-2014 21:07, Andrej Mitrovic пишет:
> On 2/7/14, Dmitry Olshansky <dmitry.olsh at gmail.com> wrote:
>> Much simpler - it returns a special dchar to designate bad encoding. And
>> there is one defined by Unicode spec.
>
> A NaN for chars? Sounds great to me! :)
>

It's called \uFFFD and is specifically for bad encodings. I wonder why 
nobody had perused the spec when writing std.utf.decode in the first 
place...

5.22 Best Practice for U+FFFD Substitution

When converting text from one character encoding to another, a 
conversion algorithm may
encounter unconvertible code units. This is most commonly caused by some 
sort of corruption
of the source data, so that it does not correctly follow the 
specification for that
character encoding. Examples include dropping a byte in a multibyte 
encoding such as
Shift-JIS, improper concatenation of strings, a mismatch between an 
encoding declaration
and actual encoding of text, use of non-shortest form for UTF-8, and so on.

...

Whenever an unconvertible offset is reached during conversion of a code
unit sequence:
1. The maximal subpart at that offset should be replaced by a single
U+FFFD.
2. The conversion should proceed at the offset immediately after the maximal
subpart.
---

Fast, simple and according to the standard. Best of all - no stinkin' 
exceptions! ;)

-- 
Dmitry Olshansky


More information about the Digitalmars-d mailing list