Improving D's support of code-pages

BCS ao at pathlink.com
Sat Aug 18 15:43:20 PDT 2007


Reply to Kirk,

> Kirk McDonald wrote:
> 
>> Walter Bright wrote:
>> 
>>> Kirk McDonald wrote:
>>> 
>>>> ----
>>>> Additions to Phobos
>>>> ----
>>>> The first thing Phobos needs are the following functions. (Their
>>>> basic interface has been cribbed from Python.)
>>>> 
>>>> char[] decode(ubyte[] str, string encoding, string error="strict");
>>>> wchar[] wdecode(ubyte[] str, string encoding, string
>>>> error="strict"); dchar[] ddecode(ubyte[] str, string encoding,
>>>> string error="strict");
>>>> 
>>>> ubyte[] encode(char[] str, string encoding, string error="strict");
>>>> ubyte[] encode(wchar[] str, string encoding, string
>>>> error="strict"); ubyte[] encode(dchar[] str, string encoding,
>>>> string error="strict");
>>>> 
>>> If you (or someone else) wants to write these, I'll put them in.
>>> 
>> It is not a small amount of work. Perhaps I will take a look at how
>> big of a problem it is (after the conference).
>> 
>>>> ----
>>>> Improvements to Phobos
>>>> ----
>>>> The behavior of writef (and perhaps of D's formatting in general)
>>>> must be altered.
>>>> 
>>>> Currently, printing a char[] causes D to output the raw bytes in
>>>> the string. As I previously mentioned, this is not a good thing. On
>>>> many platforms, this can easily result in garbage being printed to
>>>> the screen.
>>>> 
>>>> I propose changing writef to check the console's encoding, and to
>>>> attempt to encode the output in that encoding. Then it can simply
>>>> output the resulting raw bytes. Checking this encoding is a
>>>> platform-specific operation, but essentially every platform
>>>> (particularly Linux, Windows, and OS X) has a way to do it. If the
>>>> string cannot be encoded in that encoding, the exception thrown by
>>>> encode() should be allowed to propagate and terminate the program
>>>> (or be caught by the user). If the user wishes to avoid that
>>>> exception, they should call encode() explicitly themselves. For
>>>> this reason, Phobos will also need a function for retrieving the
>>>> console's default encoding made available to the user.
>>>> 
>>> There's a big problem with this - what if the output is being sent
>>> to a file?
>>> 
>> Files have no inherent encoding, only the console does. In this way,
>> writing to a file is different than writing to the console. The user
>> must explcitly provide an encoding when writing to a file; or, if
>> they are writing a char[], wchar[], or dchar[], the encoding will be
>> UTF-8, -16, or -32. (Writing a char[] implies an encoding, while
>> writing a ubyte[] does not.)
>> 
> I should clarify this: When treating stdout like a file, it should be
> like any other file: writing to it writes raw bytes. But when calling
> writef, which is not treating it like a file, it should attempt to
> encode the output into the console's default encoding.
> 

"Stream" has a writef, so you can call writef for a file.





More information about the Digitalmars-d mailing list