string need to be robust

ZY Zhou rinick at GeeMail.com
Sun Mar 13 05:25:22 PDT 2011


> but I think that it's completely unreasonable to expect
> all of the string-based and/or range-based functions to be able to handle
> invalid unicode.

As I explained in the first mail, if utf8 parser convert all invalid utf8 chars to
low surrogate code points(0x80~0xFF ->
0xDC80~0xDCFF), other string related functions will still work fine, and you can
also handle these error if you want

string s = "\xa0";
foreach(dchar d; s) {
  if (isValidUnicode(d)) {
    process(d);
  } else {
    handleError(d);
  }
}


== Quote from Jonathan M Davis (jmdavisProg at gmx.com)'s article
> On Sunday 13 March 2011 04:34:24 ZY Zhou wrote:
> > std.utf throw exception instead of crash the program. but you still need to
> > add try/catch everywhere.
> >
> > My point is: this simple code should work, instead of crash, it is supposed
> > to leave all invalid codes untouched and just process the valid parts.
> >
> > Stream file = new BufferedFile("sample.txt");
> > foreach(char[] line; file) {
> >    string s = line.idup.tolower;
> > }
> I think that it's completely unreasonable to expect all string functions to
> worry about whether they're dealing with valid unicode or not. And a lot of
> string stuff would involve ranges which would require converting each code point
> to UTF-32. And how is it supposed to do _that_ with invalid UTF-8?
> I don't know how you expect to really be able to do anything with invalid UTF-8
> anyway. There may be something that could be added to std.utf to help better
> handle the situation, but I think that it's completely unreasonable to expect
> all of the string-based and/or range-based functions to be able to handle
> invalid unicode.
> - Jonathan M Davis



More information about the Digitalmars-d mailing list