[phobos] UTF-8 string slicing
Jonathan M Davis
jmdavisProg at gmx.com
Sat Aug 20 16:51:05 PDT 2011
On Saturday, August 20, 2011 13:11:44 unDEFER wrote:
> Big thanks, Jonathan!
> You give me very clearly explanations.
> But what you mean by "strings of char and wchar ... have no length
> property" if "string.length" really works? Is it a bug?
All arrays have a length property. It returns the number of elements in the
array. The issue is std.range.hasLength, which is what is used with range-
based functions in template constraints and static ifs. hasLength is true for
all arrays _except_ for arrays of char and wchar. This is because strings are
ranges of dchar - of code points - whereas they are arrays of code units, and
in UTF-8 and UTF-16, there can be more than one code unit per code point. In
the general case, calling length on an array of char or wchar isn't going to
give you the the number of code points in the array. So, it's normally
incorrect to use length with arrays of char and wchar in range-based
functions.
string str = "hello world";
assert(str.length == walkLength(str));
This works, because it only uses ASCII characters which all fit in one code
unit. Whereas this doesn't
auto str = "Привет";
assert(str.length == walkLength(str));
since the characters are more than one code unit each. walkLength uses the
length property if hasLength is true, but otherwise it iterates over the whole
array and counts how many elements that there are. So, in range-based
functions, we use walkLength, not length, unless it is a section of code where
we know though the range has a length property and that using it directly is
correct (based on the template constraint and/or static ifs that the block of
code is in).
- Jonathan M Davis
More information about the phobos
mailing list