Wide characters support in D

Ruslan Nikolaev nruslan_devel at yahoo.com
Mon Jun 7 16:51:59 PDT 2010


Just one more addition: it is possible to have built-in function that converts multibyte (or multiword) char sequence (even though in my proposal it can be of different size) to dchar (UTF-32) character. Again, my only point is that it would be nice to have something similar to TCHAR so that all libraries can use it if they choose not to provide functions for all 3 types.

2Walter:
Yes, programmers do often ignore surrogate pairs in case of UTF-16. But in case of undetermined char size (1 or 2 bytes) they will have to use special builtin conversion functions to dchar unless they want their code to be completely broken.

Thanks,
Ruslan. 

--- On Tue, 6/8/10, Ruslan Nikolaev <nruslan_devel at yahoo.com> wrote:

> From: Ruslan Nikolaev <nruslan_devel at yahoo.com>
> Subject: Re: Wide characters support in D
> To: "digitalmars.D" <digitalmars-d at puremagic.com>
> Date: Tuesday, June 8, 2010, 3:16 AM
> Ok, ok... that was just a
> suggestion... Thanks, for reply about "Hello world"
> representation. Was postfix "w" and "d" added initially or
> just recently? I did not know about it. I thought D does
> automatic conversion for string literals.
> 
> Yes, templates may help. However, that unnecessary make
> code bigger (since we have to compile it for every char
> type). The other problem is that it allows programmer to
> choose which one to use. He or she may just prefer char[] as
> UTF-8 (or wchar[] as UTF-16). That will be fine on platform
> that supports this encoding natively (e.g. for file system
> operations, screen output, etc.), whereas it will cause
> conversion overhead on the other. Not to say that it's a big
> overhead, but unnecessary one. Having said this, I do agree
> that there must be some flexibility (e.g. in Java char[] is
> always 2 bytes), however, I don't believe that this
> flexibility should be available for application programmer.
> 
> I don't think there is any problem with having different
> size of char. In fact, that would make programs better
> (since application programmers will have to think in terms
> of characters as opposed to bytes). System programmers (i.e.
> OS programmers) may choose to think as they expect it to be
> (since char width option can be added to compiler). TCHAR in
> Windows is a good example of it. Whenever you need to
> determine size of element (e.g. for allocation), you can use
> 'sizeof'. Again, it does not mean that you're deprived of
> char/wchar/dchar capability. It still can be supported (e.g.
> via ubyte/ushort/uint) for the sake of interoperability or
> some special cases. Special string constants (e.g. ""b, ""w,
> ""d) can be supported, too. My only point is that it would
> be good to have universal char type that depends on
> platform. That, in turns, allows to have unified char for
> all libraries on this platform.
> 
> In addition, commonly used constants '\n', '\r', '\t' will
> be the same regardless of char width.
> 
> Anyway, that was just a suggestion. You may disagree with
> this if you wish.
> 
> Ruslan.
> 
> 
>       
> 


      


More information about the Digitalmars-d mailing list