invalid utf-8 sequence

james Jamesg4 at gmail.com
Tue Jan 6 18:20:37 PST 2009


Jarrett Billingsley Wrote:

> On Tue, Jan 6, 2009 at 8:04 PM, james <Jamesg4 at gmail.com> wrote:
> > im writing an indexer, but im having a problem because on some file, when i read gives this error
> >
> > Error 4: invalid UTF-8 sequence
> >
> > is there a way to fix it.
> >
> 
> You're probably reading a file that's encoded in some non-Unicode
> encoding, like Latin-1.  You could read in the file data as byte[]
> instead of as char[], but that still doesn't deal with the problem
> that you have characters in your file that are outside the ASCII
> range.  If you know what encoding your file uses, you could do some
> transformations on it to turn it into valid Unicode, or you could just
> ignore characters outside the ASCII range :P

is there any library or function that can automatically convert these unknown html charset into UTF-8



More information about the Digitalmars-d-learn mailing list