List of Phobos functions that allocate memory?
Dmitry Olshansky
dmitry.olsh at gmail.com
Fri Feb 7 12:14:04 PST 2014
07-Feb-2014 21:07, Andrej Mitrovic пишет:
> On 2/7/14, Dmitry Olshansky <dmitry.olsh at gmail.com> wrote:
>> Much simpler - it returns a special dchar to designate bad encoding. And
>> there is one defined by Unicode spec.
>
> A NaN for chars? Sounds great to me! :)
>
It's called \uFFFD and is specifically for bad encodings. I wonder why
nobody had perused the spec when writing std.utf.decode in the first
place...
5.22 Best Practice for U+FFFD Substitution
When converting text from one character encoding to another, a
conversion algorithm may
encounter unconvertible code units. This is most commonly caused by some
sort of corruption
of the source data, so that it does not correctly follow the
specification for that
character encoding. Examples include dropping a byte in a multibyte
encoding such as
Shift-JIS, improper concatenation of strings, a mismatch between an
encoding declaration
and actual encoding of text, use of non-shortest form for UTF-8, and so on.
...
Whenever an unconvertible offset is reached during conversion of a code
unit sequence:
1. The maximal subpart at that offset should be replaced by a single
U+FFFD.
2. The conversion should proceed at the offset immediately after the maximal
subpart.
---
Fast, simple and according to the standard. Best of all - no stinkin'
exceptions! ;)
--
Dmitry Olshansky
More information about the Digitalmars-d
mailing list