Implicit encoding conversion on string ~= int ?

Marco Leise Marco.Leise at gmx.de
Sun Jun 23 08:06:41 PDT 2013


I've seen some C code, that does something like string[i] =
int, which seems to implicitly cast the int to a char.
Now in D to get it running I just did string ~= int and
wondered why the resulting code page 850 string looked correct
on the UTF-8 terminal. Then I asserted that 'string' only ever
grows by one byte for each append and the assertion failed. So
there is a hidden conversion from some charset (probably
Windows or Latin-1?) to a UTF-8 multi-byte string going on.

While it is convenient, this code uses some form of LZ77 and
will from time to time append copies of previous parts of
'string' to it. In that case the byte offsets wouldn't match
any more and the result be garbage.

Eventually I'd have looked over the code and created the CP850
string in a temporary ubyte[], but in the mean time I wonder
what the rationale behind this automatic conversion is and if
we want to keep it like that. Is this documented behavior ?

-- 
Marco



More information about the Digitalmars-d mailing list