Improving D's support of code-pages
Kirk McDonald
kirklin.mcdonald at gmail.com
Sat Aug 18 14:53:31 PDT 2007
Walter Bright wrote:
> Kirk McDonald wrote:
>
>> ----
>> Additions to Phobos
>> ----
>>
>> The first thing Phobos needs are the following functions. (Their basic
>> interface has been cribbed from Python.)
>>
>> char[] decode(ubyte[] str, string encoding, string error="strict");
>> wchar[] wdecode(ubyte[] str, string encoding, string error="strict");
>> dchar[] ddecode(ubyte[] str, string encoding, string error="strict");
>>
>> ubyte[] encode(char[] str, string encoding, string error="strict");
>> ubyte[] encode(wchar[] str, string encoding, string error="strict");
>> ubyte[] encode(dchar[] str, string encoding, string error="strict");
>
>
> If you (or someone else) wants to write these, I'll put them in.
>
It is not a small amount of work. Perhaps I will take a look at how big
of a problem it is (after the conference).
>> ----
>> Improvements to Phobos
>> ----
>>
>> The behavior of writef (and perhaps of D's formatting in general) must
>> be altered.
>>
>> Currently, printing a char[] causes D to output the raw bytes in the
>> string. As I previously mentioned, this is not a good thing. On many
>> platforms, this can easily result in garbage being printed to the screen.
>>
>> I propose changing writef to check the console's encoding, and to
>> attempt to encode the output in that encoding. Then it can simply
>> output the resulting raw bytes. Checking this encoding is a
>> platform-specific operation, but essentially every platform
>> (particularly Linux, Windows, and OS X) has a way to do it. If the
>> string cannot be encoded in that encoding, the exception thrown by
>> encode() should be allowed to propagate and terminate the program (or
>> be caught by the user). If the user wishes to avoid that exception,
>> they should call encode() explicitly themselves. For this reason,
>> Phobos will also need a function for retrieving the console's default
>> encoding made available to the user.
>
>
> There's a big problem with this - what if the output is being sent to a
> file?
Files have no inherent encoding, only the console does. In this way,
writing to a file is different than writing to the console. The user
must explcitly provide an encoding when writing to a file; or, if they
are writing a char[], wchar[], or dchar[], the encoding will be UTF-8,
-16, or -32. (Writing a char[] implies an encoding, while writing a
ubyte[] does not.)
--
Kirk McDonald
http://kirkmcdonald.blogspot.com
Pyd: Connecting D and Python
http://pyd.dsource.org
More information about the Digitalmars-d
mailing list