etc.curl: Formal review begin

Tue Aug 30 11:10:52 PDT 2011

On Tuesday, August 30, 2011 10:57 dsimcha wrote:
> == Quote from Andrei Alexandrescu (SeeWebsiteForEmail at erdani.org)'s
> 
> > It is a continuous source of surprise to me that even seasoned
> > programmers don't realize that this is an inefficient copy routine:
> > while (read(source, buffer))
> > 
> > write(target, buffer);
> > 
> > If the methods are synchronous and the speeds of source and target are
> > independent, the net transfer rate of the routine is R1*R1/(R1+R2),
> > where R1 and R2 are the transfer rates of the source and destination
> > respectively. In the worst case R1=R2 and the net transfer rate is half
> > that.
> > This is an equation very easy to derive from first principles but many
> > people are very incredulous about it. Consequently, many classic file
> > copying programs (including cp; I don't know about wget or curl) use the
> > inefficient method. As the variety of data sources increases (SSD,
> > magnetic, networked etc) I predict async I/O will become increasingly
> > prevalent. In an async approach with a queue, transfer proceeds at the
> > optimal speed min(R1, R2). That's why I'm insisting the async range
> > should be super easy to use, encapsulated, and robust: if people reach
> > for the async range by default for their dealings with networked data,
> > they'll write optimal code, sometimes even without knowing it.
> > If your article discusses this and shows e.g. how to copy data optimally
> > from one server to another using HTTP, or from one server to a file etc,
> > and if furthermore you show how your API makes all that a trivial
> > five-liner, that would be a very instructive piece.
> > Andrei
> 
> A similar situation applies when reading data and performing CPU-intensive
> processing on it line-by-line, chunk-by-chunk, etc. If the processing
> takes as long as the reading, you'll take a 50% speed hit for no good
> reason. This was the motivation behind std.parallelism.asyncBuf.
> 
> Of course, asyncBuf is unsafe. I/O-related Phobos modules should provide
> safe, encapsulated solutions rather than requiring the user to use
> std.concurrency, std.parallelism or core.thread manually. Any of these
> represents poor encapsulation, the latter two are inherently unsafe and
> std.concurrency is often not flexible enough to implement efficient async
> I/O without breaking its safety guarantees with casts. Speaking of which,
> I should probably get busy with the
> std.stdio.File.byLineAsync/byChunkAsync pull request I've been meaning to
> make. What's good for HTTP is probably good for file I/O as well, and all
> the low-level concurrency code to make this happen is already in
> std.parallelism.
> 
> In general, I think you hit the nail on the head: Safe, encapsulated,
> easy-to-use async I/O should be a standard feature for all I/O-related
> Phobos modules.

std.file.copy is synchronous. Would the suggestion then be to change it to be 
asynchronous or to create a second function (e.g. copyAsync) which does an 
asynchrous copy?

- Jonathan M Davis