char array weirdness

Mon Mar 28 16:06:49 PDT 2016

On Monday, 28 March 2016 at 22:49:28 UTC, Jack Stouffer wrote:
> On Monday, 28 March 2016 at 22:43:26 UTC, Anon wrote:
>> On Monday, 28 March 2016 at 22:34:31 UTC, Jack Stouffer wrote:
>>> void main () {
>>>     import std.range.primitives;
>>>     char[] val = ['1', '0', 'h', '3', '6', 'm', '2', '8', 
>>> 's'];
>>>     pragma(msg, ElementEncodingType!(typeof(val)));
>>>     pragma(msg, typeof(val.front));
>>> }
>>>
>>> prints
>>>
>>>     char
>>>     dchar
>>>
>>> Why?
>>
>> Unicode! `char` is UTF-8, which means a character can be from 
>> 1 to 4 bytes. val.front gives a `dchar` (UTF-32), consuming 
>> those bytes and giving you a sensible value.
>
> But the value fits into a char;

The compiler doesn't know that, and it isn't true in general. You 
could have, for example, U+3042 in your char[]. That would be 
encoded as three chars. It wouldn't make sense (or be correct) 
for val.front to yield '\xe3' (the first byte of U+3042 in UTF-8).

> a dchar is a waste of space.

If you're processing Unicode text, you *need* to use that space. 
Any because you're using ranges, it is only 3 extra bytes, 
anyway. It isn't going to hurt on modern systems.

> Why on Earth would a different type be given for the front 
> value than the type of the elements themselves?

Unicode. A single char cannot hold a Unicode code point. A single 
dchar can.