invalid utf-8 sequence
Stewart Gordon
smjg_1998 at yahoo.com
Wed Jan 7 13:44:48 PST 2009
james wrote:
> Jarrett Billingsley Wrote:
>
>> On Tue, Jan 6, 2009 at 8:04 PM, james <Jamesg4 at gmail.com> wrote:
>>> im writing an indexer, but im having a problem because on some file, when i read gives this error
>>>
>>> Error 4: invalid UTF-8 sequence
>>>
>>> is there a way to fix it.
>>
>> You're probably reading a file that's encoded in some non-Unicode
>> encoding, like Latin-1. You could read in the file data as byte[]
>> instead of as char[], but that still doesn't deal with the problem
>> that you have characters in your file that are outside the ASCII
>> range. If you know what encoding your file uses, you could do some
>> transformations on it to turn it into valid Unicode, or you could just
>> ignore characters outside the ASCII range :P
>
> is there any library or function that can automatically convert these unknown html charset into UTF-8
You mean that tries to work out what character set a file is in and then
translates it?
(What is the current state of the art of character set detection
heuristics?)
Stewart.
More information about the Digitalmars-d-learn
mailing list