string is rarely useful as a function argument

Piotr Szturmaj bncrbme at jadamspam.pl
Sat Dec 31 07:10:17 PST 2011


Timon Gehr wrote:
> Me too. I think the way we have it now is optimal. The only reason we
> are discussing this is because of fear that uneducated users will write
> code that does not take into account Unicode characters above code point
> 0x80.

+1

>From D's string docs:

"char[] strings are in UTF-8 format. wchar[] strings are in UTF-16 
format. dchar[] strings are in UTF-32 format."

I would additionally add some clarifications:

char[] is an array of 8-bit code units. Unicode code point may take up 
to 4 chars.
wchar[] is an array of 16-bit code units. Unicode code point may take up 
to 2 wchars.
dchar[] is an array of 32-bit code units. Unicode code point always fits 
into one dchar.

Each of these formats may encode any Unicode string.

If you need indexing or slicing use:
* char[] or string when working with ASCII code points.
* wchar[] or wstring when working with Basic Multilingual Plane (BMP) 
code points.
* dchar[] or dstring when working with all possible code points.

If you do not need indexing or slicing you may use any of the formats.


More information about the Digitalmars-d mailing list