Character set conversions
Jonathan M Davis
jmdavisProg at gmx.com
Sun May 29 19:52:23 PDT 2011
On 2011-05-29 19:21, Adam D. Ruppe wrote:
> I've encountered some problems with other charsets recently. Phobos has
> a std.encoding that can do some useful stuff, but there's some
> encodings I've seen in the wild that it can't handle (indeed, it's
> a fairly short list that it does support)
>
> I used gnu iconv for one of my projects and it works for me, but
> I wonder:
>
> Is anyone planning to add more charset support to Phobos?
> (alternatively, am I missing something already there?)
>
>
> If no, maybe I'll do a few myself. I've never actually written code
> to do this, but it can't be rocket science. I suspect it's more
> tedious than anything else.
Well, generally the idea is that you just use UTF-8, UTF-16, or UTF-32, and
for the most part, I wouldn't really expect people to be using UTF-16 when
they need to interface with Windows system functions which require it. By
definition, char is supposed to be UTF-8, wchar is supposed to be UTF-16, and
dchar is supposed to be UTF-32. I don't really think that it's expected that
you be using any other encodings within your typical D program. Sometimes it
may be necessary to translate from another encoding to UTF-8, UTF-16, or
UTF-32 when getting input from somewhere, and sometimes it may be necessary to
translate to another encoding from UTF-8, UTF-16, or UTF-16 when outputting
somewhere, but it certainly isn't the norm. It may be that we need better
suppport for dealing with those cases, but they should really only be for
converting on input or output. So, if you want to improve std.encoding to
handle more charsets, then feel free, but don't expect the rest of Phobos to
work with anything beyond UTF-8, UTF-16, and UTF-16. It's going to be throwing
UtfExceptions if you do.
- Jonathan M Davis
More information about the Digitalmars-d
mailing list