D and Unicode(UTF16) strings

Lionello Lunesu lionello at lunesu.remove.com
Thu Jul 24 20:12:14 PDT 2008


"Vincent Richomme" <forumer at smartmobili.com> wrote in message 
news:g6b5ne$1anf$1 at digitalmars.com...
> Hi,
>
> would it be possible to add a type wstring that could represent a UTF16 
> string. Actually on Windows platform you can compile in ANSI or UNICODE 
> and you have the standard char* as well as a wchar_t*.
>
> I saw that in D string is an alias for char[], would it be possible to do 
> the same for wchar[] and define a wstring in core language ?

That's already the case; check object.d in the dmd distri.

> That would allow to declare an alias like this :
>
> Version(Unicode)
> {
>  alias wstring tstring
> }
> else
> {
> alias string tstring
> }

Although, coming from C++, that might seem a good idea at first, note that 
Windows doesn't quite know about UTF8. It can convert UTF8 to UNICODE and 
back, but apart from the MultiByteToWideChar-like functions you cannot pass 
UTF8 (ie. string, char[]) to any ANSI Windows API.

The ANSI functions all use the current thead code page for conversion, which 
cannot be set to UTF8. (God knows I've tried. If anybody managed to do just 
this, pls let me know how.)

I'd suggest to stick to wstring/Unicode. Most Unicode APIs are also 
available on Win95 so there should be little reason to use the ANSI 
functions for any Windows application. Trying to use UTF8 on Windows means 
that you'll either have to constantly convert the UTF8 strings to Unicode 
yourself, or use byte[] instead of "string" to prevent any errors using 
Phobos/Tango APIs that assume char[]/string contains UTF8.

Anyway, that's what I've found out while messing with unicode/ansi stuff on 
Windows. It might even be outdated at this point..

L. 




More information about the Digitalmars-d mailing list