The Case Against Autodecode

Walter Bright via Digitalmars-d digitalmars-d at puremagic.com
Fri May 27 16:26:20 PDT 2016


On 5/27/2016 11:27 AM, Andrei Alexandrescu wrote:
> On 5/27/16 1:11 PM, Walter Bright wrote:
>> They mean code units.
>
> Always valid or potentially invalid as well? -- Andrei

Some years ago I would have said always valid. Experience, however, says that 
Unicode is often dirty and code should be tolerant of that.

Consider Unicode in a text editor. You can't have it throwing exceptions, 
silently changing things to replacement characters, etc., when there's a few 
invalid sequences in it. You also can't just say "the file isn't Unicode" and 
refuse to display the Unicode in it.

It isn't hard to deal with invalid Unicode in a user friendly manner.


More information about the Digitalmars-d mailing list