regarding Latin1 to UTF8 encoding
Hugo Florentino
hugo at acdam.cu
Sun Dec 8 19:07:39 PST 2013
On Mon, 09 Dec 2013 03:44:19 +0100, Adam D. Ruppe wrote:
> On Monday, 9 December 2013 at 02:40:29 UTC, Hugo Florentino wrote:
>> auto input = readText("myfile.htm");
>
> Don't use readText if it isn't utf-8; readtext assumes it is utf 8.
>
> I've never actually used std.encoding (I wrote my own encoding module
> for my dom.d, which I used for website scraping too) but I think this
> is what you want:
>
> Latin1String input = cast(Latin1String) std.file.read("myfile.htm");
> string buffer;
> transcode(input, buffer);
> auto output = replace(buffer, re1, re2);
>
>
> see if that works
Actually, it did work, even keeping input type as auto.
It seems the explicit typecast to Lating1String was the required
element for it to work, which makes sense now that I think about it.
Thanks a lot for the (amazingly quick) reply ;)
Now, if I may add a closely related doubt:
Suppose "myfile.txt" was given to me daily by careless people who
usually save it as Latin1 but from time to time might save it as UTF8.
Is there a way to detect the encoding prior to typecasting/loading the
file?
Regards, Hugo
More information about the Digitalmars-d-learn
mailing list