string need to be robust

Michel Fortin michel.fortin at michelf.com
Sun Mar 13 07:57:07 PDT 2011


On 2011-03-13 10:18:24 -0400, ZY Zhou <rinick at GeeeMail.com> said:

> What if I'm making a text editor with D?
> I know the text has something wrong, I want to open it and fix it. the 
> exception
> won't help, if the editor just refuse to open invalid file, then the editor is
> useless.
> Try open an invalid utf file with a text editor, like vim, you will understand
> what I mean

But what is the best thing to do when you got an invalid UTF file in a 
text editor? Perhaps you should show a warning to the user, perhaps you 
also should ask the user to select the right text encoding (because it 
might simply not be UTF-8), or perhaps you want to silently ignore the 
error and show an invalid character marker at the right point in the 
text. All of these options are valid and the programing language 
shouldn't decide that for you.

So I'd point out that a text file editor is a special use case, most 
programs aren't text file editors and don't share this concern. In the 
same vein, HTML parsers are also a special case that should know how to 
handle encodings. In fact, HTML 5 defines explicitly how to deal with 
invalid UTF-8 sequences:
<http://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#utf-8>

There 

are many good ways to deal with invalid UTF-8 sequences. Throwing an 
exception seems like the most robust one to me since it protects 
against invalid input. What to do with invalid input belongs in the 
application logic, not the language.

-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/



More information about the Digitalmars-d mailing list