Some questions about strings

Ali Çehreli acehreli at yahoo.com
Mon Jun 22 03:31:17 UTC 2020


On 6/21/20 8:17 PM, Denis wrote:> I have a few questions about how 
strings are stored.
 >
 > - First, is there any difference between string, wstring and dstring?

string is char[]
wstring is wchar[]
dstring is dchar[]

char is 1 byte: UTF-8 code unit
wchar is 2 bytes: UTF-16 code unit
dchar is 4 bytes: UTF-32 code unit

 > For example, a 3-byte Unicode character literal can be assigned to a
 > variable of any of these types, then printed, etc, without errors.

You can reveal some of the mystery by looking at their .length property. 
Additionally, foreach will visit these types element-by-element: char, 
wchar, and dchar, respectively.

 > - Are the characters of a string stored in memory by their Unicode
 > codepoint(s), as opposed to some other encoding?

As UTF encodings; nothing else.

 > - Assuming that the answer to the first question is "no difference", do
 > strings always allocate 4 bytes per codepoint?

No. They always allocate sufficient bytes to represent the code points 
in their respective UTF encodings. dstring is the only one where the 
number of code points equals the number of elements: UTF-32 code units, 
each being 4 bytes.

 > - Can a series of codepoints, appropriately padded to the required
 > width, and terminated by a null character,

null character is not required but may be a part of the strings.

 > be directly assigned to a
 > string WITHOUT GOING THROUGH A DECODING / ENCODING TRANSLATION?

It will go through decoding/encoding.

 > The last question gets to the heart of what I'd ultimately like to
 > accomplish and avoid.
 >
 > Thanks for your help.

There is also the infamous "auto decoding" of Phobos algorithms (which 
is as a mistake). I think one tool to get away from auto decoding of 
strings is std.string.representation:

   https://dlang.org/phobos/std_string.html#.representation

Because it returns a type that is not a string, there is not auto 
decoding to speak of. :)

Ali



More information about the Digitalmars-d-learn mailing list