Implicit encoding conversion on string ~= int ?

Marco Leise Marco.Leise at gmx.de
Sun Jun 23 10:12:21 PDT 2013


Am Sun, 23 Jun 2013 18:37:16 +0200
schrieb "bearophile" <bearophileHUGS at lycos.com>:

> Adam D. Ruppe:
> 
> > char[] a;
> > int b = 1000;
> > a ~= b;
> >
> > the "a ~= b" is more like "a ~= cast(dchar) b", and then dchar 
> > -> char means it may be multibyte encoded, going from utf-32 to 
> > utf-8.

No no no, this is not what happens. In my case it was:
string a;
int b = 228;  // CP850 value for 'ä'. Note: fits in a single byte!
a ~= b;

Maybe it goes as follows:
o compiler sees ~= to a string and becomes "aware" of wchar and dchar
  conversions to char
o appended value is only checked for size (type and signedness are lost)
  and maps int to dchar
o this dchar value is now checked for Unicode conformity and fails the test
o the dchar value is now assumed to be Latin-1, Windows-1252 or similar
  and a conversion routine invoked
o the dchar value is converted to utf-8 and...
o appended as a multi-byte string to variable "a".

That still doesn't sound right to me thought. What if the dchar value is
not valid Unicode AND >= 256 ?

> I didn't know that, is this already in Bugzilla?
> 
> Bye,
> bearophile

I don't know what exactly is supposed to happen here.

-- 
Marco



More information about the Digitalmars-d mailing list