char array weirdness

Mon Mar 28 16:02:26 PDT 2016

On Mon, Mar 28, 2016 at 10:49:28PM +0000, Jack Stouffer via Digitalmars-d-learn wrote:
> On Monday, 28 March 2016 at 22:43:26 UTC, Anon wrote:
> >On Monday, 28 March 2016 at 22:34:31 UTC, Jack Stouffer wrote:
> >>void main () {
> >>    import std.range.primitives;
> >>    char[] val = ['1', '0', 'h', '3', '6', 'm', '2', '8', 's'];
> >>    pragma(msg, ElementEncodingType!(typeof(val)));
> >>    pragma(msg, typeof(val.front));
> >>}
> >>
> >>prints
> >>
> >>    char
> >>    dchar
> >>
> >>Why?
> >
> >Unicode! `char` is UTF-8, which means a character can be from 1 to 4
> >bytes. val.front gives a `dchar` (UTF-32), consuming those bytes and
> >giving you a sensible value.
> 
> But the value fits into a char; a dchar is a waste of space. Why on
> Earth would a different type be given for the front value than the
> type of the elements themselves?

Welcome to the world of auto-decoding.  Phobos ranges always treat any
string / wstring / dstring as a range of dchar, even if it's encoded as
UTF-8.

The pros and cons of auto-decoding have been debated to death several
times already. Walter hates it and wishes to get rid of it, but so far
Andrei has refused to budge.  Personally I lean on the side of killing
auto-decoding, but it seems unlikely to change at this point.  (But you
never know... if enough people revolt against it, maybe there's a small
chance Andrei could be convinced...)

For the time being, I'd recommend std.utf.byCodeUnit as a workaround.

T

-- 
Those who don't understand D are condemned to reinvent it, poorly. -- Daniel N