invalid utf-8 sequence

Jarrett Billingsley jarrett.billingsley at gmail.com
Tue Jan 6 19:23:48 PST 2009


On Tue, Jan 6, 2009 at 9:20 PM, james <Jamesg4 at gmail.com> wrote:
> Jarrett Billingsley Wrote:
>
>> On Tue, Jan 6, 2009 at 8:04 PM, james <Jamesg4 at gmail.com> wrote:
>> > im writing an indexer, but im having a problem because on some file, when i read gives this error
>> >
>> > Error 4: invalid UTF-8 sequence
>> >
>> > is there a way to fix it.
>> >
>>
>> You're probably reading a file that's encoded in some non-Unicode
>> encoding, like Latin-1.  You could read in the file data as byte[]
>> instead of as char[], but that still doesn't deal with the problem
>> that you have characters in your file that are outside the ASCII
>> range.  If you know what encoding your file uses, you could do some
>> transformations on it to turn it into valid Unicode, or you could just
>> ignore characters outside the ASCII range :P
>
> is there any library or function that can automatically convert these unknown html charset into UTF-8

Not that I know of, for D anyway.


More information about the Digitalmars-d-learn mailing list