Streaming library
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Thu Oct 14 21:24:00 PDT 2010
On 10/14/10 21:22 CDT, Rainer Deyke wrote:
> On 10/14/2010 15:49, Andrei Alexandrescu wrote:
>> Good point. Perhaps indeed it's best to only deal with bytes and
>> characters at transport level.
>
> Make that just bytes.
>
> Characters data must be encoded into bytes before it is written and
> decoded before it is read. The low-level OS functions only deal with
> bytes, not characters.
I'm not so sure about that. For example, some code in std.stdio is
dedicated to supporting fwide():
http://www.opengroup.org/onlinepubs/000095399/functions/fwide.html
As far as I understand, a wide stream is essentially an UCS-2 (or
UTF-16? Not sure) stream that is impossible to abstract away as a stream
of bytes.
I see Windows' commitment to fwide is... odd:
http://msdn.microsoft.com/en-us/library/aa985619%28VS.80%29.aspx
The ultimate question is whether we want to support that (as well as
other dedicated text streams) or not.
> Text encoding is a complicated process - consider different unicode
> encodings, different non-unicode encodings, byte order markers, and
> Windows versus Unix line endings. Furthermore, it is often useful to
> wedge an additional translation layer between the low-level (binary)
> stream and the high-level text encoding layer, such as an encryption or
> compression layer.
>
> Writing characters directly to streams made sense in the pre-Unicode
> world where there was a one-to-one correspondence between characters and
> bytes. In a modern world, text encoding is an important service that
> deserves its own standalone module.
I'd say quite the opposite. Since now encodings are embedded all the way
down at the low level (per fwide above), we can't pretend it's all bytes
down there and leave characters to upper layers. There _are_ transports
that deal with characters directly.
So the $1M question is, do we support text transports or not?
- fwide streams
- files for which isatty() returns true
(http://www.opengroup.org/onlinepubs/009695399/functions/isatty.html)
- email protocol and probably other Internet protocols
- others?
If we don't support text at the transport level, things can still made
to work but in a more fragile manner: upper-level protocols will need to
_know_ that although the API accepts any ubyte[], in fact the results
would be weird and malfunctioning if the wrong things are being passed.
A text-based transport would clarify at the type level that a text
stream accepts only UTF-encoded characters.
I think either way is not a catastrophe. We can make it work.
Andrei
More information about the Digitalmars-d
mailing list