[phobos] std.base64 replacement
Andrei Alexandrescu
andrei at erdani.com
Wed Oct 13 17:45:38 PDT 2010
On 10/13/2010 04:48 PM, Shin Fujishiro wrote:
> No. Uh... Let's forget about ranges for the next two paragraphs.
>
> Consider decoding a base64 unit (4 chars) by hand. You may want to
> (a) pull four chars from a source, and decode them into three bytes.
> Then, (b) you'll push the decoded bytes to a destination. Done.
>
> Conversion works naturally if (a) data can be pulled from a source and
> (b) converted data can be pushed to a destination. If either of them
> can't be achieved, we need an extra cache to pool dangling data. The
> "control" I wrote meant these two points.
>
> Then, can ranges support pull and push? No, unfortunately. They are
> restricted to pull *or* push semantics. Decorator input ranges can pull
> data from source ranges, but can't push converted data to anywhere.
> Output ranges have similar inconveniences.
>
> So, ranges are not best for conversion drivers IMO. Ranges are at
> their best when used as sources and destinations. We may support
> decorator ranges, but they should not be the main API.
>
> Hey, I'm not dissing ranges nor your implementation. :-) I'm just
> afraid of people making everything ranges in the first place!
This ties into our earlier discussion about streams, and the ongoing
discussion on the newsgroup.
Shin, there's no interface to satisfy all streaming needs. Some streams
produce data at a variable rate. For those this is best:
void read(ref ubyte[] data);
So the client would have to pull data at unpredictable lengths and deal
with it. Some streams need to hold internal buffers that are not under
user's control. For those a straight range interface exposing ubyte[] is
enough:
@property ubyte[] front();
Some other ranges work best with a user-supplied buffer of a size also
decided by the user:
size_t read(ubyte[] buffer);
Now let's talk about decorator streams, which must read from some stream
and write to another. In particular, those M:N ranges that produce and
consume data at different rates. Depending on M > N versus M < N _and_
on the use of one of the APIs above, the M:N decorator would have to do
its own buffering. I don't think there's a simple way out that satisfies
everyone.
About the ongoing discussion about Base64: I do see a few problems with
the current interface, although not a major one.
1. The template parameters '!' and '/' are not justified. They should be
runtime parameters. Rule of thumb: use generic code when you stand to
profit.
2. This function:
size_t encode(Range)(in ubyte[] source, Range range);
has one issue: (a) it forces input to an array although it could work
with any input range with length of ubyte. Suggestion:
size_t encode(R1, R2)(R1 source, R2 target);
Constrain the template any way you need that keeps implementation
efficient. Ideally you should have roughly the same performance with a
ubyte[] as before.
3. Same discussion about decode. This is actually more important because
you might want to decode streams of dchar. This is how many streams will
come through, even though they are technically Ascii.
I'm not saying we should use ranges everywhere, but if it doesn't really
cost anything, accepting a range is better than an array.
Regarding Daniel's approach with char/byte level ranges through and
through, in an ideal world I'd agree. But I fear that the implementation
would not be as efficient. (I suggest you benchmark it against
Masahiro's.) Also, practically, more often than not I'll want to work
one chunk at a time, not one byte/char at a time.
Andrei
More information about the phobos
mailing list