Transparent ANSI to UTF-8 conversion

Lubos Pintes lubos.pintes at gmail.com
Wed Feb 27 12:35:00 PST 2013


I don't understand the CTFE usage in this context. I thought about 
something like
dchar[] windows_1250=[...];
Isn't this enough?
Thank

Dňa 27. 2. 2013 18:32 Dmitry Olshansky  wrote / napísal(a):
> 27-Feb-2013 16:20, monarch_dodra пишет:
>> On Wednesday, 27 February 2013 at 10:56:16 UTC, Lubos Pintes wrote:
>>> Hi,
>>> I would like to transparently convert from ANSI to UTF-8 when dealing
>>> with text files. For example here in Slovakia, virtually every text
>>> file is in Windows-1250.
>>> If someone opens a text file, he or she expects that it will work
>>> properly. So I suppose, that it is not feasible to tell someone "if
>>> you want to use my program, please convert every text to UTF-8".
>>>
>>> To obtain the mapping from ANSI to Unicode for particular code page is
>>> trivial. Maybe even MultibyteToWidechar could help with this.
>>>
>>> I however need to know how to do it "D-way". Could I define something
>>> like TextReader class? Or perhaps some support already exists somewhere?
>>> Thank
>>
>> I'd say the D way would be to simply exploit the fact that UTF is built
>> into the language, and as such, not worry about encoding, and use raw
>> code points.
>>
>> You get you "Codepage to unicode *codepoint*" table, and then you simply
>> map each character to a dchar. From there, D will itself convert your
>> raw unicode (aka UTF-32) to UTF8 on the fly, when you need it. For
>> example, writing to a file will automatically convert input to UTF-8.
>> You can also simply use std.conv.to!string to convert any UTF scheme to
>> UTF-8 (or any other UTF too for that matter).
>
> Making a table that translates ANSI to UTF8 is trivially constructible
> using CTFE from the static one that does ANSI -> dchar.
>>
>> This may not be as efficient as a "true" "codepage to UTF8 table" but:
>> 1) Given you'll most probably be IO bound anyways, who cares?
>
> With in-memory transcoding you won't be. Text editors are typically all
> in-memory or mmap-ed.
>
>> 2) Scalability. D does everything but the code page to code point
>> mapping. Why bother doing any more than that?
>
>



More information about the Digitalmars-d-learn mailing list