Curl wrapper
jdrewsen
jdrewsen at nospam.com
Wed May 18 15:29:16 PDT 2011
Den 18-05-2011 16:53, Andrei Alexandrescu skrev:
> On 5/18/11 6:07 AM, Jonas Drewsen wrote:
>> Select will wait for data to be ready and ask curl to handle the data
>> chunk. Curl in turn calls back to a registered callback handler with the
>> data read. That handler fills the buffer provided by the user. If not
>> enough data has been receive an new select is performed until the
>> requested amount of data is read. Then the blocking method can return.
>
> Perhaps this would be too complicated. In any case the core
> functionality must be paid top attention. And the core functionality is
> streaming.
>
> Currently there are two proposed ways to stream data from an HTTP
> address: (a) by using the onReceive callback, and (b) by using
> byLine/byChunk. If either of these perform slower than the
> best-of-the-breed streaming using libcurl, we have failed.
>
> The onReceive method is not particularly appealing because the client
> and libcurl block each other: the client is blocked while libcurl is
> waiting for data, and the client blocks libcurl while inside the
> callback. (Please correct me if I'm wrong.)
>
> To make byLine/byChunk fast, the basic setup should include a hidden
> thread that does the download in separation from the client's thread.
> There should be K buffers allocated (K = 2 to e.g. 10), and a simple
> protocol for passing the buffers back and forth between the client
> thread and the hidden thread. That way, in the quiescent state, there is
> no memory allocation and either both client and libcurl are busy doing
> work, or one is much slower than the other, which waits.
>
> The same mechanism should be used in byChunkAsync or byFileAsync.
If byChunk is using a hidden thread to download into buffers, then how
does it differ from the byChunkAsync that you mention?
The current curl wrapper actually does the hidden thread trick (based on
a hint you gave me a while ago). It does not reuse buffers because I
thought that all data had to be immutable or by value to go through the
message passing system. I'll fix this since it is a good place to do
some type casting to allow passing the buffers for reuse.
I think that we have to consider the context of the streaming before we
can tell the best solution. I do not have any number to back the
following up, but this is how I see it:
If data that is read is going to be processed (e.g. compressed) in some
way it is most likely a benefit to spawn a thread to handle the data
buffering.
If no processing is done (e.g. a simple copy from net to disk) I believe
keeping things in the same thread and simply select on sockets (disk or
net) is fastest. This way no message passing and context switching is
taking place and does cause any overhead. libcurl can give you access to
the file descriptors for this exact purpose but it does have some
drawbacks: you are not in control of the buffers used by libcurl. This
means that reading from one curl connection and sending on another you
would have to copy the data. libcurl does in fact provide even simpler
methods where you can provide your own buffers for read/writes.
Unfortunately this is only supported for HTTP and a lot of the
convenience features such as redirections are lost. The more you want to
control to get the last drop of performance, the more you have to
manually handle yourself.
In my opinion I think that providing the performance of the standard
libcurl API in the D wrapper is the way to go (as done in the current
curl wrapper). Generic and efficient streaming across protocols is best
done in std.net where buffers can be handled entirely in D. I know this
is not a small task which is why I started out with wrapping libcurl.
Thanks
Jonas
More information about the Digitalmars-d
mailing list