[D-runtime] Wide characters in D
Ruslan Nikolaev
nruslan_devel at yahoo.com
Wed Jun 9 10:24:48 PDT 2010
We transferred our discussion to general D-language mailing list. The rationale of having tchar and some problems with templates were discussed there.
Thank you!
Ruslan.
--- On Wed, 6/9/10, Sean Kelly <sean at invisibleduck.org> wrote:
> From: Sean Kelly <sean at invisibleduck.org>
> Subject: Re: [D-runtime] Wide characters in D
> To: "D's runtime library developers list" <d-runtime at puremagic.com>
> Date: Wednesday, June 9, 2010, 8:55 PM
> On Jun 6, 2010, at 5:00 PM, Ruslan
> Nikolaev wrote:
>
> > Hi. I am new to D. It looks like D supports 3 types of
> characters: char, wchar, dchar. This is cool, however, I
> have some questions about it:
> >
> > 1. When we have 2 methods (one with wchar[] and
> another with char[]), how D will determine which one to use
> if I pass a string "hello world"?
>
> You'll get an overload error because an unqualified string
> literal converts to both string and wstring. You'd
> have to either cast or use "hello world"c or "hello world"w
> to call the desired routine.
>
> > 2. Many libraries (e.g. tango or phobos) don't provide
> functions/methods (or have incomplete support) for
> wchar/dchar
> > e.g. writefln probably assumes char[] for strings like
> "Number %d..."
>
> I think writefln will actually accept any kind of
> string. That's how the code looks anyway, though I've
> never tried anything but utf-8.
>
> > 3. Even if they do support, it is kind of annoying to
> provide methods for all 3 types of chars. Especially, if we
> want to use native mode (e.g. for Windows wchar is better,
> for Linux char is better). E.g. Windows has _wopen,
> _wdirent, _wreaddir, _wopenddir, _wmain(int argc, wchar_t[]
> argv) and so on, and they should be native (in a sense that
> no conversion is necessary when we do, for instance,
> _wopen). Linux doesn't have them as UTF-8 is used widely
> there.
>
> Templates should largely take care of this for library
> functions. It's rare that an algorithm has to know
> it's working with a string of characters.
>
> > Since D language is targeted on system programming,
> why not to try to use whatever works better on a particular
> system (e.g. char will be 2 bytes on Windows and 1 byte on
> Linux; it can be a compiler switch, and all libraries can be
> compiled properly on a particular system). It's still
> necessary to have all 3 types of char for cooperation with
> C. But in those cases byte, short and int will do their
> work. For this kind of situation, it would be nice to have
> some built-in functions for transparent conversion from char
> to byte/short/int and vice versa (especially, if conversion
> only happens if needed on a particular platform).
>
> Casting? Or do you mean codepage conversions?
> Personally, I'd rather use a specific encoding internally
> and if necessary convert during IO. If you want your
> app to be portable you won't be able to use a single
> encoding throughout anyway--you'll need utf-8 for IO on
> Posix, utf-16 for IO on Windows, etc.
>
> > In my opinion, to separate notion of character from
> byte would be nice, and it makes sense as a particular
> platform uses either UTF-8 or UTF-16 natively. Programmers
> may write universal code (like TCHAR on Windows).
> Unfortunately, C uses 'char' and 'byte' interchangeably but
> why D has to make this mistake again?
>
> Working with multibyte characters is computationally
> expensive and often unnecessary. Plus, it makes things
> a bit weird in a systems language. If I have a char*,
> seems like dereferencing the pointer would give me a value
> of 4 bytes back, the last 0-3 being zero? I really
> don't know how this would work.
>
> For codepage conversions and the like, I've had tremendous
> success with libicu. I don't know that a binding for
> it is appropriate for Phobos, but I'd love to see a
> well-maintained project for this on dsource.
> _______________________________________________
> D-runtime mailing list
> D-runtime at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/d-runtime
>
More information about the D-runtime
mailing list