Why is string.front dchar?

Jakob Ovrum jakobovrum at gmail.com
Wed Jan 15 21:56:46 PST 2014


On Tuesday, 14 January 2014 at 11:42:34 UTC, Maxim Fomin wrote:
> The root of the issue is that string literals containing 
> characters which do not fit into signle byte are still 
> converted to char[] array. This is strictly speaking not type 
> safe because it allows to reinterpret 2 or 4 byte code unit as 
> sequence of characters of 1 byte size. The string type is in 
> some sense problematic in D. That's why the fact that .front 
> returns dhcar is a way to correct the problem, it is not an 
> attempt to introduce confusion.

This assertion makes all the wrong assumptions.

`char` is a UTF-8 code unit[1], and `string` is an array of 
immutable UTF-8 code units. The whole point of UTF-8 is the 
ability to encode code points that need multiple bytes (UTF-8 
code units), so the string literal behaviour is perfectly regular.

Operations on code units are rare, which is why the standard 
library instead treats strings as ranges of code points, for 
correctness by default. However, we must not prevent the user 
from being able to work on arrays of code units, as many string 
algorithms can be optimized by not doing full UTF decoding. The 
standard library does this on many occasions, and there are more 
to come.

Note that the Unicode definition of an unqualified "character" is 
the translation of a code *point*, which is very different from a 
*glyph*, which is what people generally associate the word 
"character" with. Thus, `string` is not an array of characters 
(i.e. an array where each element is a character), but `dstring` 
can be said to be.

[1] http://dlang.org/type


More information about the Digitalmars-d-learn mailing list