Always std.utf.validate, or rely on exceptions?

SimonN via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Thu Mar 2 08:20:30 PST 2017


Many functions in std.utf throw UTFException when we pass them 
malformed UTF, and many functions in std.string throw 
StringException. From this, I developed a habit of reading user 
files like so, hoping that it traps all malformed UTF:

     try {
         // call D standard lib on string from file
     }
     catch (Exception e) {
         // treat file as bogus
         // log e.msg
     }

But std.string.stripRight!string calls std.utf.codeLength, which 
doesn't ever throw on malformed UTF, but asserts false on errors:

     ubyte codeLength(C)(dchar c) @safe pure nothrow @nogc
         if (isSomeChar!C)
     {
         static if (C.sizeof == 1)
         {
             if (c <= 0x7F) return 1;
             if (c <= 0x7FF) return 2;
             if (c <= 0xFFFF) return 3;
             if (c <= 0x10FFFF) return 4;
             assert(false);
         }
         // ...
     }

Apparently, once my code calls stripRight, I should be sure that 
this string contains only well-formed UTF. Right now, my code 
doesn't guarantee that.

Should I always validate text from files manually with 
std.utf.validate?

Or should I memorize which functions throw, then validate 
manually whenever I call the non-throwing UTF functions? What is 
the pattern behind what throws and what asserts false?

-- Simon


More information about the Digitalmars-d-learn mailing list