Proper way to fix core.exception.UnicodeException at src\rt\util\utf.d(292): invalid UTF-8 sequence by File.readln()

Dr.No jckj33 at gmail.com
Fri Apr 6 16:10:56 UTC 2018


I'm reading line by line the lines from a CSV file provided by 
the user which is assumed to be UTF8. But an user has provided an 
ANSI file which resulted in the error:

>core.exception.UnicodeException at src\rt\util\utf.d(292): invalid 
>UTF-8 sequence

(it happend when the user took the originally UTF8 encoded file 
generated by another application, made some edit using an editor 
(which I don't know the name) then saved not aware it was 
changing the encoding to ANSI.

My question is: what's the proper way to solve that? using toUTF8 
didn't solve:

> while((line = csvFile.readln().toUTF8) !is null) {

I didn't find a way to set explicitly the encoding with 
std.stdio.File to set to UTF8 regardless it's an ANSI or already 
UTF8.
I don't want to conver the whole file to UTF8, the CSV file can 
be large and might take quite while. And if I do so to a 
temporary copy the file (which will make things even more slow) 
to avoid touch user's original file.

I thought in writing my own readLine() with 
std.stdio.File.byChunk to take as many bytes as possible until 
'\n' byte is seen, treat it as UTF8 and return.

But I'd like to not reinvent the wheel and use something native, 
if possible. Any ideas?



More information about the Digitalmars-d-learn mailing list