ElementType!string
Jakob Ovrum
jakobovrum at gmail.com
Sun Aug 25 12:56:33 PDT 2013
On Sunday, 25 August 2013 at 19:25:08 UTC, qznc wrote:
> Apparently, ElementType!string evaluates to dchar. I would have
> expected char. Why is that?
It is mentioned in the documentation of `ElementType`. Use
`std.range.ElementEncodingType` or `std.traits.ForeachType` to
get `char` and `wchar` when given arrays of those two types.
As for the rationale:
`string`, being an alias for `immutable(char)[]`, is an array of
UTF-8 code units - an array of `char`s. However, it is indeed a
forward range of code points (represented as a UTF-32 code unit -
`dchar`). It's a (slightly controversial) choice that was made to
make Unicode-correct code the easiest and most intuitive to
write, as code points are much more useful than code units.
Note that it is not a random-access range. UTF-8 is a variable
length encoding, so several code units can be required to encode
a single code point. Hence, a non-trivial search is required to
get the n'th code point in a UTF-8 or UTF-16 string.
Another name for a code point is "character" (technically, a
character is what the code point translates to in the UCS).
However, it can be a deceptive name - the units we see on screen
when rendered are "graphemes", as Unicode characters can be
combining, zero-width etc.
To get a range of UTF-8 or UTF-16 code units, the code units have
to be represented as something other than `char` and `wchar`. For
example, you can cast your string to immutable(ubyte)[] to
operate on that, then cast it back at a later point.
More information about the Digitalmars-d-learn
mailing list