invalid utf-8 sequence

Tue Jan 6 19:34:23 PST 2009

Jarrett Billingsley Wrote:

> On Tue, Jan 6, 2009 at 9:20 PM, james <Jamesg4 at gmail.com> wrote:
> > Jarrett Billingsley Wrote:
> >
> >> On Tue, Jan 6, 2009 at 8:04 PM, james <Jamesg4 at gmail.com> wrote:
> >> > im writing an indexer, but im having a problem because on some file, when i read gives this error
> >> >
> >> > Error 4: invalid UTF-8 sequence
> >> >
> >> > is there a way to fix it.
> >> >
> >>
> >> You're probably reading a file that's encoded in some non-Unicode
> >> encoding, like Latin-1.  You could read in the file data as byte[]
> >> instead of as char[], but that still doesn't deal with the problem
> >> that you have characters in your file that are outside the ASCII
> >> range.  If you know what encoding your file uses, you could do some
> >> transformations on it to turn it into valid Unicode, or you could just
> >> ignore characters outside the ASCII range :P
> >
> > is there any library or function that can automatically convert these unknown html charset into UTF-8
> 
> Not that I know of, for D anyway.

i just found out about a function 'UnicodeFile' in tango, but im using D1.0 and phobos, maybe i should write one of my own.