Implicit encoding conversion on string ~= int ?
Marco Leise
Marco.Leise at gmx.de
Sun Jun 23 10:12:21 PDT 2013
Am Sun, 23 Jun 2013 18:37:16 +0200
schrieb "bearophile" <bearophileHUGS at lycos.com>:
> Adam D. Ruppe:
>
> > char[] a;
> > int b = 1000;
> > a ~= b;
> >
> > the "a ~= b" is more like "a ~= cast(dchar) b", and then dchar
> > -> char means it may be multibyte encoded, going from utf-32 to
> > utf-8.
No no no, this is not what happens. In my case it was:
string a;
int b = 228; // CP850 value for 'ä'. Note: fits in a single byte!
a ~= b;
Maybe it goes as follows:
o compiler sees ~= to a string and becomes "aware" of wchar and dchar
conversions to char
o appended value is only checked for size (type and signedness are lost)
and maps int to dchar
o this dchar value is now checked for Unicode conformity and fails the test
o the dchar value is now assumed to be Latin-1, Windows-1252 or similar
and a conversion routine invoked
o the dchar value is converted to utf-8 and...
o appended as a multi-byte string to variable "a".
That still doesn't sound right to me thought. What if the dchar value is
not valid Unicode AND >= 256 ?
> I didn't know that, is this already in Bugzilla?
>
> Bye,
> bearophile
I don't know what exactly is supposed to happen here.
--
Marco
More information about the Digitalmars-d
mailing list