regarding Latin1 to UTF8 encoding

Hugo Florentino hugo at acdam.cu
Sun Dec 8 19:07:39 PST 2013


On Mon, 09 Dec 2013 03:44:19 +0100, Adam D. Ruppe wrote:
> On Monday, 9 December 2013 at 02:40:29 UTC, Hugo Florentino wrote:
>> auto input = readText("myfile.htm");
>
> Don't use readText if it isn't utf-8; readtext assumes it is utf 8.
>
> I've never actually used std.encoding (I wrote my own encoding module
> for my dom.d, which I used for website scraping too) but I think this
> is what you want:
>
> Latin1String input = cast(Latin1String) std.file.read("myfile.htm");
> string buffer;
> transcode(input, buffer);
> auto output = replace(buffer, re1, re2);
>
>
> see if that works

Actually, it did work, even keeping input type as auto.
It seems the explicit typecast to Lating1String was the required 
element for it to work, which makes sense now that I think about it.

Thanks a lot for the (amazingly quick) reply ;)

Now, if I may add a closely related doubt:

Suppose "myfile.txt" was given to me daily by careless people who 
usually save it as Latin1 but from time to time might save it as UTF8.
Is there a way to detect the encoding prior to typecasting/loading the 
file?

Regards, Hugo


More information about the Digitalmars-d-learn mailing list