length of string result not as expected
Adam D. Ruppe
destructionator at gmail.com
Tue Aug 13 20:05:16 PDT 2013
On Wednesday, 14 August 2013 at 02:53:43 UTC, jicman wrote:
> know the exact length of the characters that I have in a char[]
> variable? Thanks.
Your code looks like D1...
in D1 or D2:
import std.uni;
dstring s2 = toUTF32(str);
writeln(s2.length); // 13
in D2 you can do it a little more efficiently like this:
import std.range;
writeln(walkLength(str)); // 13
The reason it shows 39 instead of 13 is that the char[] is UTF-8,
and Chinese characters are multi-byte characters in utf-8. The
.length property gives the number elements in the array, which
are bytes in utf-8.
dstring uses UTF-32, which has a consistent size for each code
point. Which isn't technically quite the same as a character
actually, but close enough that it works here.
Bottom line though, char[] for non-English text tends to have a
longer length than you expect because a lot of characters are
multi-byte in utf8. If you use dstring, the length is more
consistent.
More information about the Digitalmars-d-learn
mailing list