size of a string in bytes

Nestor via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Sat Jan 28 16:58:56 PST 2017


On Saturday, 28 January 2017 at 19:09:01 UTC, ag0aep6g wrote:
> In D, a `char` is a UTF-8 code unit. Its size is one byte, 
> exactly and always.
>
> A `char` is not a "character" in the common meaning of the 
> word. There's a more specialized word for "character" as a 
> visual unit: grapheme. For example, 'Ä' is a grapheme (a visual 
> unit, a "character"), but there is no single `char` for it. To 
> encode 'Ä' in UTF-8, a sequence of multiple code units is used.
> 
> ...
> 
> The elements of a `string` are (immutable) `char`s. That is, 
> `string` is an array of UTF-8 code units. It's not an array of 
> graphemes.
>
> A `string`'s .length gives you the number of `char`s in it, 
> i.e. the number of UTF-8 code units, i.e. the number of bytes.

Very good explanation.
Thank you all for making this clear to me.


More information about the Digitalmars-d-learn mailing list