[phobos] std.base64 replacement

Andrei Alexandrescu andrei at erdani.com
Wed Oct 13 17:45:38 PDT 2010


On 10/13/2010 04:48 PM, Shin Fujishiro wrote:
> No.  Uh... Let's forget about ranges for the next two paragraphs.
>
> Consider decoding a base64 unit (4 chars) by hand.  You may want to
> (a) pull four chars from a source, and decode them into three bytes.
> Then, (b) you'll push the decoded bytes to a destination.  Done.
>
> Conversion works naturally if (a) data can be pulled from a source and
> (b) converted data can be pushed to a destination.  If either of them
> can't be achieved, we need an extra cache to pool dangling data.  The
> "control" I wrote meant these two points.
>
> Then, can ranges support pull and push?  No, unfortunately.  They are
> restricted to pull *or* push semantics.  Decorator input ranges can pull
> data from source ranges, but can't push converted data to anywhere.
> Output ranges have similar inconveniences.
>
> So, ranges are not best for conversion drivers IMO.  Ranges are at
> their best when used as sources and destinations.  We may support
> decorator ranges, but they should not be the main API.
>
> Hey, I'm not dissing ranges nor your implementation. :-)  I'm just
> afraid of people making everything ranges in the first place!

This ties into our earlier discussion about streams, and the ongoing 
discussion on the newsgroup.

Shin, there's no interface to satisfy all streaming needs. Some streams 
produce data at a variable rate. For those this is best:

void read(ref ubyte[] data);

So the client would have to pull data at unpredictable lengths and deal 
with it. Some streams need to hold internal buffers that are not under 
user's control. For those a straight range interface exposing ubyte[] is 
enough:

@property ubyte[] front();

Some other ranges work best with a user-supplied buffer of a size also 
decided by the user:

size_t read(ubyte[] buffer);

Now let's talk about decorator streams, which must read from some stream 
and write to another. In particular, those M:N ranges that produce and 
consume data at different rates. Depending on M > N versus M < N _and_ 
on the use of one of the APIs above, the M:N decorator would have to do 
its own buffering. I don't think there's a simple way out that satisfies 
everyone.

About the ongoing discussion about Base64: I do see a few problems with 
the current interface, although not a major one.

1. The template parameters '!' and '/' are not justified. They should be 
runtime parameters. Rule of thumb: use generic code when you stand to 
profit.

2. This function:

size_t encode(Range)(in ubyte[] source, Range range);

has one issue: (a) it forces input to an array although it could work 
with any input range with length of ubyte. Suggestion:

size_t encode(R1, R2)(R1 source, R2 target);

Constrain the template any way you need that keeps implementation 
efficient. Ideally you should have roughly the same performance with a 
ubyte[] as before.

3. Same discussion about decode. This is actually more important because 
you might want to decode streams of dchar. This is how many streams will 
come through, even though they are technically Ascii.

I'm not saying we should use ranges everywhere, but if it doesn't really 
cost anything, accepting a range is better than an array.

Regarding Daniel's approach with char/byte level ranges through and 
through, in an ideal world I'd agree. But I fear that the implementation 
would not be as efficient. (I suggest you benchmark it against 
Masahiro's.) Also, practically, more often than not I'll want to work 
one chunk at a time, not one byte/char at a time.


Andrei


More information about the phobos mailing list