Error: 4invalid UTF-8 sequence :: How can I catch this?? (or otherwise handle it)

Charles Hixson charleshixsn at earthlink.net
Thu Oct 22 13:21:34 PDT 2009


Charles Hixson wrote:
> I want to read a bunch of files, and if the aren't UTF, then I want to 
> list their names for conversion, or other processing.  How should this 
> be handled??
> 
> try..catch..finally blocks just ignore this error.
OK.
One approach that occurs to me is to read the data in as a byte stream, 
break it into lines, and validate the lines.  But validate requires an 
array of chars, so this seems to put me right back where I was.  Unless, 
perhaps, I can cast an array of bytes into an array of chars without 
having throw an "Error: 4invalid UTF-8 sequence", then validate the 
entire array.  But if I do that, I won't know where the break should be, 
so I might only get half of a legitimate UTF-8 character, and so it 
would legitimately throw UTFException, even though the file was good.

I'm sure there are ways around that, but it really seems a round-about 
way to proceed for something that should be easy.

P.S.:  As before, the actual code that throws the error is:

try    {    lin    =    fil.readLine;    }
catch
{  writefln("File <<" ~ filIter [curFilNdx] ~ ">> is not a valid UTF 
file.");
    fil.close;
    getLine;
   return;
}
finally
{    }
debug (9) writefln ("lin = <<" ~ lin ~ ">>");
try
{ validate (lin); }
catch    (UtfException ue)
{  writefln ("File <<" ~ filIter [curFilNdx] ~ ">> is not a valid UTF 
file.");
    fil.close;
    getLine;
    return;
}

where fil is a File and getLine is one of my routines that automatically 
switches to the next file if the current file has been closed.


More information about the Digitalmars-d-learn mailing list