etc.curl: Formal review begin

dsimcha dsimcha at yahoo.com
Tue Aug 30 10:57:37 PDT 2011


== Quote from Andrei Alexandrescu (SeeWebsiteForEmail at erdani.org)'s
> It is a continuous source of surprise to me that even seasoned
> programmers don't realize that this is an inefficient copy routine:
> while (read(source, buffer))
>    write(target, buffer);
> If the methods are synchronous and the speeds of source and target are
> independent, the net transfer rate of the routine is R1*R1/(R1+R2),
> where R1 and R2 are the transfer rates of the source and destination
> respectively. In the worst case R1=R2 and the net transfer rate is half
> that.
> This is an equation very easy to derive from first principles but many
> people are very incredulous about it. Consequently, many classic file
> copying programs (including cp; I don't know about wget or curl) use the
> inefficient method. As the variety of data sources increases (SSD,
> magnetic, networked etc) I predict async I/O will become increasingly
> prevalent. In an async approach with a queue, transfer proceeds at the
> optimal speed min(R1, R2). That's why I'm insisting the async range
> should be super easy to use, encapsulated, and robust: if people reach
> for the async range by default for their dealings with networked data,
> they'll write optimal code, sometimes even without knowing it.
> If your article discusses this and shows e.g. how to copy data optimally
> from one server to another using HTTP, or from one server to a file etc,
> and if furthermore you show how your API makes all that a trivial
> five-liner, that would be a very instructive piece.
> Andrei

A similar situation applies when reading data and performing CPU-intensive
processing on it line-by-line, chunk-by-chunk, etc.  If the processing takes as
long as the reading, you'll take a 50% speed hit for no good reason.  This was the
motivation behind std.parallelism.asyncBuf.

Of course, asyncBuf is unsafe.  I/O-related Phobos modules should provide safe,
encapsulated solutions rather than requiring the user to use std.concurrency,
std.parallelism or core.thread manually.  Any of these represents poor
encapsulation, the latter two are inherently unsafe and std.concurrency is often
not flexible enough to implement efficient async I/O without breaking its safety
guarantees with casts.  Speaking of which, I should probably get busy with the
std.stdio.File.byLineAsync/byChunkAsync pull request I've been meaning to make.
What's good for HTTP is probably good for file I/O as well, and all the low-level
concurrency code to make this happen is already in std.parallelism.

In general, I think you hit the nail on the head:  Safe, encapsulated, easy-to-use
async I/O should be a standard feature for all I/O-related Phobos modules.


More information about the Digitalmars-d mailing list