Character set conversions

Jonathan M Davis jmdavisProg at gmx.com
Sun May 29 19:52:23 PDT 2011


On 2011-05-29 19:21, Adam D. Ruppe wrote:
> I've encountered some problems with other charsets recently. Phobos has
> a std.encoding that can do some useful stuff, but there's some
> encodings I've seen in the wild that it can't handle (indeed, it's
> a fairly short list that it does support)
> 
> I used gnu iconv for one of my projects and it works for me, but
> I wonder:
> 
> Is anyone planning to add more charset support to Phobos?
> (alternatively, am I missing something already there?)
> 
> 
> If no, maybe I'll do a few myself. I've never actually written code
> to do this, but it can't be rocket science. I suspect it's more
> tedious than anything else.

Well, generally the idea is that you just use UTF-8, UTF-16, or UTF-32, and 
for the most part, I wouldn't really expect people to be using UTF-16 when 
they need to interface with Windows system functions which require it. By 
definition, char is supposed to be UTF-8, wchar is supposed to be UTF-16, and 
dchar is supposed to be UTF-32. I don't really think that it's expected that 
you be using any other encodings within your typical D program. Sometimes it 
may be necessary to translate from another encoding to UTF-8, UTF-16, or 
UTF-32 when getting input from somewhere, and sometimes it may be necessary to 
translate to another encoding from UTF-8, UTF-16, or UTF-16 when outputting 
somewhere, but it certainly isn't the norm. It may be that we need better 
suppport for dealing with those cases, but they should really only be for 
converting on input or output. So, if you want to improve std.encoding to 
handle more charsets, then feel free, but don't expect the rest of Phobos to 
work with anything beyond UTF-8, UTF-16, and UTF-16. It's going to be throwing 
UtfExceptions if you do.

- Jonathan M Davis


More information about the Digitalmars-d mailing list