Improving D's support of code-pages
Lars Noschinski
lars-2006-1 at usenet.noschinski.de
Mon Aug 20 07:17:03 PDT 2007
* Sean Kelly <sean at f4.ca> [07-08-20 02:40]:
>Anders F Björklund wrote:
>>It was my understanding that D by design only supports UTF environments,
>>and the behaviour on legacy systems (CP437/ISO-8859-1) is "undefined"...
>>It's not only output, if you run on a such a system and try to read the
>>args (char[][]) you can get an UTF exception due to it being malformed.
>
>Tango converts the input args to UTF-8 on Win32 rather than just accepting them
>as they are. The args are left alone on Unix however, because most Unix
>consoles seem to use Unicode anyway.
Probably args should by (u)byte[][] anyway. Converting command line
arguments could have pretty annoying effects. For example, unix
filenames may contain any 8-bit value except '/' and '\0', arguments may
contain every char except '\0'. They are also charset agnostic, the only
place where the charset is the terminal emulator, all other parts of the
system treat it as binary data.
Also, an automatic charset conversion on console output would probably
be annoying, as stdin and stderr are often used to read and write binary
data, as in
tar -c foo | gzip -9 | split targzipped-foo.
So at least, one should use isatty to decide, if the in/output is an
interactive terminal.
More information about the Digitalmars-d
mailing list