Some questions about strings
Ali Çehreli
acehreli at yahoo.com
Mon Jun 22 03:31:17 UTC 2020
On 6/21/20 8:17 PM, Denis wrote:> I have a few questions about how
strings are stored.
>
> - First, is there any difference between string, wstring and dstring?
string is char[]
wstring is wchar[]
dstring is dchar[]
char is 1 byte: UTF-8 code unit
wchar is 2 bytes: UTF-16 code unit
dchar is 4 bytes: UTF-32 code unit
> For example, a 3-byte Unicode character literal can be assigned to a
> variable of any of these types, then printed, etc, without errors.
You can reveal some of the mystery by looking at their .length property.
Additionally, foreach will visit these types element-by-element: char,
wchar, and dchar, respectively.
> - Are the characters of a string stored in memory by their Unicode
> codepoint(s), as opposed to some other encoding?
As UTF encodings; nothing else.
> - Assuming that the answer to the first question is "no difference", do
> strings always allocate 4 bytes per codepoint?
No. They always allocate sufficient bytes to represent the code points
in their respective UTF encodings. dstring is the only one where the
number of code points equals the number of elements: UTF-32 code units,
each being 4 bytes.
> - Can a series of codepoints, appropriately padded to the required
> width, and terminated by a null character,
null character is not required but may be a part of the strings.
> be directly assigned to a
> string WITHOUT GOING THROUGH A DECODING / ENCODING TRANSLATION?
It will go through decoding/encoding.
> The last question gets to the heart of what I'd ultimately like to
> accomplish and avoid.
>
> Thanks for your help.
There is also the infamous "auto decoding" of Phobos algorithms (which
is as a mistake). I think one tool to get away from auto decoding of
strings is std.string.representation:
https://dlang.org/phobos/std_string.html#.representation
Because it returns a type that is not a string, there is not auto
decoding to speak of. :)
Ali
More information about the Digitalmars-d-learn
mailing list