size of a string in bytes

ag0aep6g via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Sat Jan 28 11:09:01 PST 2017


On Saturday, 28 January 2017 at 18:04:58 UTC, Nestor wrote:
> I believe I saw somewhere that in D a char was not neccesarrily 
> the same as an ubyte because chars sometimes take more than one 
> byte,

In D, a `char` is a UTF-8 code unit. Its size is one byte, 
exactly and always.

A `char` is not a "character" in the common meaning of the word. 
There's a more specialized word for "character" as a visual unit: 
grapheme. For example, 'Ä' is a grapheme (a visual unit, a 
"character"), but there is no single `char` for it. To encode 'Ä' 
in UTF-8, a sequence of multiple code units is used.

> so since a string is an array of chars, I thought length 
> behaved like walkLength (which I had not seen), in other words, 
> that it simply returned the amount of elements in the array.

The elements of a `string` are (immutable) `char`s. That is, 
`string` is an array of UTF-8 code units. It's not an array of 
graphemes.

A `string`'s .length gives you the number of `char`s in it, i.e. 
the number of UTF-8 code units, i.e. the number of bytes.


More information about the Digitalmars-d-learn mailing list