size of a string in bytes

rikki cattermole via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Sat Jan 28 08:00:39 PST 2017


On 29/01/2017 4:32 AM, Nestor wrote:
> On Saturday, 28 January 2017 at 14:56:03 UTC, rikki cattermole wrote:
>> On 29/01/2017 3:51 AM, Nestor wrote:
>>> Hi,
>>>
>>> One can get the length of a string easily, however since strings are
>>> UTF-8, sometimes characters take more than one byte. I would like to
>>> know then how many bytes does a string take, but this code didn't work
>>> as I expected:
>>>
>>> import std.stdio;
>>> void main() {
>>>   string mystring1;
>>>   string mystring2 = "A string of just 48 characters for testing size.";
>>>   writeln(mystring1.sizeof);
>>>   writeln( mystring2.sizeof);
>>> }
>>>
>>> In both cases the size is 8, so apparently sizeof is giving me just the
>>> default size of a string type and not the size of the variable in
>>> memory, which is what I want.
>>>
>>> Ideas?
>>
>> A few misconceptions going on here.
>> A string element is not a grapheme it is a character which is one byte.
>>
>> So what you want is mystring.length
>>
>> Now sizeof is not telling you about the elements, its telling you how
>> big the reference to it is. Specifically length + pointer. It would
>> have been 16 if you compiled in 64bit mode for example.
>>
>> If you want to know about graphemes and code points that is another
>> story.
>> For that you'll want std.uni[0] and std.utf[1].
>>
>> [0] http://dlang.org/phobos/std_uni.html
>> [1] http://dlang.org/phobos/std_utf.html
>
> I do not want string lenth or code points. Perhaps I didn't explain
> myselft.
>
> I want to know variable size in memory. For example, say I have an UTF-8
> string of only 2 characters, but each of them takes 2 bytes. string
> length would be 2, but the content of the string would take 4 bytes in
> memory (excluding overhead for type size).
>
> How can I get that?

.length

You are misunderstanding a char will always be exactly one byte in size.

Check[0] for proof.

Keep in mind here is the definition of string[1]:
alias immutable(char)[]  string;

There is nothing fancy going on.
What you were asking about "characters" wise is actually graphemes as 
per the unicode standard, they can be multiple bytes and codepoints in 
size but not a char.

[0] http://dlang.org/spec/type.html
[1] https://github.com/dlang/druntime/blob/master/src/object.d


More information about the Digitalmars-d-learn mailing list