char array weirdness
Anon via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Mon Mar 28 16:06:49 PDT 2016
On Monday, 28 March 2016 at 22:49:28 UTC, Jack Stouffer wrote:
> On Monday, 28 March 2016 at 22:43:26 UTC, Anon wrote:
>> On Monday, 28 March 2016 at 22:34:31 UTC, Jack Stouffer wrote:
>>> void main () {
>>> import std.range.primitives;
>>> char[] val = ['1', '0', 'h', '3', '6', 'm', '2', '8',
>>> 's'];
>>> pragma(msg, ElementEncodingType!(typeof(val)));
>>> pragma(msg, typeof(val.front));
>>> }
>>>
>>> prints
>>>
>>> char
>>> dchar
>>>
>>> Why?
>>
>> Unicode! `char` is UTF-8, which means a character can be from
>> 1 to 4 bytes. val.front gives a `dchar` (UTF-32), consuming
>> those bytes and giving you a sensible value.
>
> But the value fits into a char;
The compiler doesn't know that, and it isn't true in general. You
could have, for example, U+3042 in your char[]. That would be
encoded as three chars. It wouldn't make sense (or be correct)
for val.front to yield '\xe3' (the first byte of U+3042 in UTF-8).
> a dchar is a waste of space.
If you're processing Unicode text, you *need* to use that space.
Any because you're using ranges, it is only 3 extra bytes,
anyway. It isn't going to hurt on modern systems.
> Why on Earth would a different type be given for the front
> value than the type of the elements themselves?
Unicode. A single char cannot hold a Unicode code point. A single
dchar can.
More information about the Digitalmars-d-learn
mailing list