The Case Against Autodecode

Fri May 27 16:26:20 PDT 2016

On 5/27/2016 11:27 AM, Andrei Alexandrescu wrote:
> On 5/27/16 1:11 PM, Walter Bright wrote:
>> They mean code units.
>
> Always valid or potentially invalid as well? -- Andrei

Some years ago I would have said always valid. Experience, however, says that 
Unicode is often dirty and code should be tolerant of that.

Consider Unicode in a text editor. You can't have it throwing exceptions, 
silently changing things to replacement characters, etc., when there's a few 
invalid sequences in it. You also can't just say "the file isn't Unicode" and 
refuse to display the Unicode in it.

It isn't hard to deal with invalid Unicode in a user friendly manner.