char array weirdness
H. S. Teoh via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Mon Mar 28 16:02:26 PDT 2016
On Mon, Mar 28, 2016 at 10:49:28PM +0000, Jack Stouffer via Digitalmars-d-learn wrote:
> On Monday, 28 March 2016 at 22:43:26 UTC, Anon wrote:
> >On Monday, 28 March 2016 at 22:34:31 UTC, Jack Stouffer wrote:
> >>void main () {
> >> import std.range.primitives;
> >> char[] val = ['1', '0', 'h', '3', '6', 'm', '2', '8', 's'];
> >> pragma(msg, ElementEncodingType!(typeof(val)));
> >> pragma(msg, typeof(val.front));
> >>}
> >>
> >>prints
> >>
> >> char
> >> dchar
> >>
> >>Why?
> >
> >Unicode! `char` is UTF-8, which means a character can be from 1 to 4
> >bytes. val.front gives a `dchar` (UTF-32), consuming those bytes and
> >giving you a sensible value.
>
> But the value fits into a char; a dchar is a waste of space. Why on
> Earth would a different type be given for the front value than the
> type of the elements themselves?
Welcome to the world of auto-decoding. Phobos ranges always treat any
string / wstring / dstring as a range of dchar, even if it's encoded as
UTF-8.
The pros and cons of auto-decoding have been debated to death several
times already. Walter hates it and wishes to get rid of it, but so far
Andrei has refused to budge. Personally I lean on the side of killing
auto-decoding, but it seems unlikely to change at this point. (But you
never know... if enough people revolt against it, maybe there's a small
chance Andrei could be convinced...)
For the time being, I'd recommend std.utf.byCodeUnit as a workaround.
T
--
Those who don't understand D are condemned to reinvent it, poorly. -- Daniel N
More information about the Digitalmars-d-learn
mailing list