length of string result not as expected

Adam D. Ruppe destructionator at gmail.com
Tue Aug 13 20:05:16 PDT 2013


On Wednesday, 14 August 2013 at 02:53:43 UTC, jicman wrote:
> know the exact length of the characters that I have in a char[] 
> variable?  Thanks.

Your code looks like D1...

in D1 or D2:
import std.uni;
dstring s2 = toUTF32(str);
writeln(s2.length); // 13


in D2 you can do it a little more efficiently like this:

import std.range;
writeln(walkLength(str)); // 13



The reason it shows 39 instead of 13 is that the char[] is UTF-8, 
and Chinese characters are multi-byte characters in utf-8. The 
.length property gives the number elements in the array, which 
are bytes in utf-8.

dstring uses UTF-32, which has a consistent size for each code 
point. Which isn't technically quite the same as a character 
actually, but close enough that it works here.


Bottom line though, char[] for non-English text tends to have a 
longer length than you expect because a lot of characters are 
multi-byte in utf8. If you use dstring, the length is more 
consistent.


More information about the Digitalmars-d-learn mailing list