Streaming library

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Thu Oct 14 21:24:00 PDT 2010


On 10/14/10 21:22 CDT, Rainer Deyke wrote:
> On 10/14/2010 15:49, Andrei Alexandrescu wrote:
>> Good point. Perhaps indeed it's best to only deal with bytes and
>> characters at transport level.
>
> Make that just bytes.
>
> Characters data must be encoded into bytes before it is written and
> decoded before it is read.  The low-level OS functions only deal with
> bytes, not characters.

I'm not so sure about that. For example, some code in std.stdio is 
dedicated to supporting fwide():

http://www.opengroup.org/onlinepubs/000095399/functions/fwide.html

As far as I understand, a wide stream is essentially an UCS-2 (or 
UTF-16? Not sure) stream that is impossible to abstract away as a stream 
of bytes.

I see Windows' commitment to fwide is... odd:

http://msdn.microsoft.com/en-us/library/aa985619%28VS.80%29.aspx

The ultimate question is whether we want to support that (as well as 
other dedicated text streams) or not.

> Text encoding is a complicated process - consider different unicode
> encodings, different non-unicode encodings, byte order markers, and
> Windows versus Unix line endings.  Furthermore, it is often useful to
> wedge an additional translation layer between the low-level (binary)
> stream and the high-level text encoding layer, such as an encryption or
> compression layer.
>
> Writing characters directly to streams made sense in the pre-Unicode
> world where there was a one-to-one correspondence between characters and
> bytes.  In a modern world, text encoding is an important service that
> deserves its own standalone module.

I'd say quite the opposite. Since now encodings are embedded all the way 
down at the low level (per fwide above), we can't pretend it's all bytes 
down there and leave characters to upper layers. There _are_ transports 
that deal with characters directly.

So the $1M question is, do we support text transports or not?

- fwide streams

- files for which isatty() returns true 
(http://www.opengroup.org/onlinepubs/009695399/functions/isatty.html)

- email protocol and probably other Internet protocols

- others?

If we don't support text at the transport level, things can still made 
to work but in a more fragile manner: upper-level protocols will need to 
_know_ that although the API accepts any ubyte[], in fact the results 
would be weird and malfunctioning if the wrong things are being passed. 
A text-based transport would clarify at the type level that a text 
stream accepts only UTF-encoded characters.

I think either way is not a catastrophe. We can make it work.


Andrei


More information about the Digitalmars-d mailing list