string need to be robust

Jonathan M Davis jmdavisProg at gmx.com
Sun Mar 13 04:59:12 PDT 2011


On Sunday 13 March 2011 04:34:24 ZY Zhou wrote:
> std.utf throw exception instead of crash the program. but you still need to
> add try/catch everywhere.
> 
> My point is: this simple code should work, instead of crash, it is supposed
> to leave all invalid codes untouched and just process the valid parts.
> 
> Stream file = new BufferedFile("sample.txt");
> foreach(char[] line; file) {
>    string s = line.idup.tolower;
> }

I think that it's completely unreasonable to expect all string functions to 
worry about whether they're dealing with valid unicode or not. And a lot of 
string stuff would involve ranges which would require converting each code point 
to UTF-32. And how is it supposed to do _that_ with invalid UTF-8?

I don't know how you expect to really be able to do anything with invalid UTF-8 
anyway. There may be something that could be added to std.utf to help better 
handle the situation, but I think that it's completely unreasonable to expect 
all of the string-based and/or range-based functions to be able to handle 
invalid unicode.

- Jonathan M Davis


More information about the Digitalmars-d mailing list