Error: 4invalid UTF-8 sequence :: How can I catch this?? (or otherwise handle it)
Charles Hixson
charleshixsn at earthlink.net
Thu Oct 22 13:21:34 PDT 2009
Charles Hixson wrote:
> I want to read a bunch of files, and if the aren't UTF, then I want to
> list their names for conversion, or other processing. How should this
> be handled??
>
> try..catch..finally blocks just ignore this error.
OK.
One approach that occurs to me is to read the data in as a byte stream,
break it into lines, and validate the lines. But validate requires an
array of chars, so this seems to put me right back where I was. Unless,
perhaps, I can cast an array of bytes into an array of chars without
having throw an "Error: 4invalid UTF-8 sequence", then validate the
entire array. But if I do that, I won't know where the break should be,
so I might only get half of a legitimate UTF-8 character, and so it
would legitimately throw UTFException, even though the file was good.
I'm sure there are ways around that, but it really seems a round-about
way to proceed for something that should be easy.
P.S.: As before, the actual code that throws the error is:
try { lin = fil.readLine; }
catch
{ writefln("File <<" ~ filIter [curFilNdx] ~ ">> is not a valid UTF
file.");
fil.close;
getLine;
return;
}
finally
{ }
debug (9) writefln ("lin = <<" ~ lin ~ ">>");
try
{ validate (lin); }
catch (UtfException ue)
{ writefln ("File <<" ~ filIter [curFilNdx] ~ ">> is not a valid UTF
file.");
fil.close;
getLine;
return;
}
where fil is a File and getLine is one of my routines that automatically
switches to the next file if the current file has been closed.
More information about the Digitalmars-d-learn
mailing list