typeof(string.front) should be char

Sat Mar 3 00:40:47 PST 2012

On Friday, March 02, 2012 20:41:35 Ali Çehreli wrote:
> On 03/02/2012 06:30 PM, Piotr Szturmaj wrote:
>  > Hello,
>  > 
>  > For this code:
>  > 
>  > auto c = "test"c;
>  > auto w = "test"w;
>  > auto d = "test"d;
>  > pragma(msg, typeof(c.front));
>  > pragma(msg, typeof(w.front));
>  > pragma(msg, typeof(d.front));
>  > 
>  > compiler prints:
>  > 
>  > dchar
>  > dchar
>  > immutable(dchar)
>  > 
>  > IMO it should print this:
>  > 
>  > immutable(char)
>  > immutable(wchar)
>  > immutable(dchar)
>  > 
>  > Is it a bug?
> 
> No, that's by design. When used as InputRange ranges, slices of any
> character type are exposed as ranges of dchar.

Indeed.

Strings are always treated as ranges of dchar, because it generally makes no 
sense to operate on individual chars or wchars. A char is a UTF-8 code unit. A 
wchar is a UTF-16 code unit. And a dchar is a UTF-32 code unit. The _only_ one 
of those which is guranteed to be a code point is dchar, since in UTF-32, all 
code points are a single code unit. If you were to operate on individual chars 
or wchars, you'd be operating on pieces of characters rather than whole 
characters, which wreaks havoc with unicode.

Now, technically speaking, a code point isn't necessarily a full character, 
since you can also combine code points (e.g. adding a subscript to a letter), 
and a full character is what's called a grapheme, and unfortunately, at the 
moment, Phobos doesn't have a way to operate on graphemes, but operating on 
code points is _far_ more correct than operating on code units. It's also more 
efficient.

Unfortunately, in order to code completely efficiently with unicode, you have 
understand quite a bit about it, which most programmers don't, but by 
operating on ranges of code points, Phobos manages to be correct in the 
majority of cases.

So, yes. It's very much on purpose that all strings are treated as ranges of 
dchar.

- Jonathan M Davis