Curl support RFC

Jonas Drewsen jdrewsen at nospam.com
Sun Mar 27 13:50:48 PDT 2011


On 25/03/11 12.07, Johannes Pfau wrote:
> Johannes Pfau wrote:
>> Jonas Drewsen wrote:
>>> Hi,
>>>
>>>    So I've been working a bit on the etc.curl module. Currently most
>>> of
>>> the HTTP functionality is done and some very simple Ftp.
>>>
>>> I would very much like to know if this has a chance of getting in
>>> phobos if I finish it with the current design. If not then it will be
>>> for my own project only and doesn't need as much documentation or all
>>> the features.
>>>
>>> https://github.com/jcd/phobos/tree/curl
>>>
>>> I do know that the error handling is currently not good enough... WIP.
>>>
>>> /Jonas
>>>
>>>
>>> On 11/03/11 16.20, Jonas Drewsen wrote:
>>>> Hi,
>>>>
>>>> So I've spent some time trying to wrap libcurl for D. There is a lot
>>>> of things that you can do with libcurl which I did not know so I'm
>>>> starting out small.
>>>>
>>>> For now I've created all the declarations for the latest public curl
>>>> C api. I have put that in the etc.c.curl module.
>>>>
>>>> On top of that I've created a more D like api as seen below. This is
>>>> located in the 'etc.curl' module. What you can see below currently
>>>> works but before proceeding further down this road I would like to
>>>> get your comments on it.
>>>>
>>>> //
>>>> // Simple HTTP GET with sane defaults
>>>> // provides the .content, .headers and .status
>>>> //
>>>> writeln( Http.get("http://www.google.com").content );
>>>>
>>>> //
>>>> // GET with custom data receiver delegates
>>>> //
>>>> Http http = new Http("http://www.google.dk");
>>>> http.setReceiveHeaderCallback( (string key, string value) {
>>>> writeln(key ~ ":" ~ value);
>>>> } );
>>>> http.setReceiveCallback( (string data) { /* drop */ } );
>>>> http.perform;
>>>>
>>>> //
>>>> // POST with some timouts
>>>> //
>>>> http.setUrl("http://www.testing.com/test.cgi");
>>>> http.setReceiveCallback( (string data) { writeln(data); } );
>>>> http.setConnectTimeout(1000);
>>>> http.setDataTimeout(1000);
>>>> http.setDnsTimeout(1000);
>>>> http.setPostData("The quick....");
>>>> http.perform;
>>>>
>>>> //
>>>> // PUT with data sender delegate
>>>> //
>>>> string msg = "Hello world";
>>>> size_t len = msg.length; /* using chuncked transfer if omitted */
>>>>
>>>> http.setSendCallback( delegate size_t(char[] data) {
>>>> if (msg.empty) return 0;
>>>> auto l = msg.length;
>>>> data[0..l] = msg[0..$];
>>>> msg.length = 0;
>>>> return l;
>>>> },
>>>> HttpMethod.put, len );
>>>> http.perform;
>>>>
>>>> //
>>>> // HTTPS
>>>> //
>>>> writeln(Http.get("https://mail.google.com").content);
>>>>
>>>> //
>>>> // FTP
>>>> //
>>>> writeln(Ftp.get("ftp://ftp.digitalmars.com/sieve.ds",
>>>> "./downloaded-file"));
>>>>
>>>>
>>>> // ... authenication, cookies, interface select, progress callback
>>>> // etc. is also implemented this way.
>>>>
>>>>
>>>> /Jonas
>>>
>>
>> I looked at the code again and I got 2 more suggestions:
>>
>> 1.) Would it be useful to have a headersReceived callback which would
>> be called when all headers have been received (when the data callback
>> is called the first time)? I think of a situation where you don't know
>> what data the server will return: a few KB html which you can easily
>> keep in memory or a huge file which you'd have to save to disk. You
>> can only know that if the headers have been received. It would also be
>> possible to do that by just overwriting the headerCallback and looking
>> out for the ContentLength/ContentType header, but I think it should
>> also work with the default headerCallback.
>>
>> 2.)
>> As far as I can see you store the http headers in a case sensitive way.
>> (res.headers[key] ~= value;). This means "Content-Length" vs
>> "content-length" would produce two entries in the array and it makes
>> it difficult to get the header from the associative array. It is maybe
>> useful to keep the original casing, but probably not in the array key.
>>
>> BTW: According to RFC2616 the only headers which are allowed
>> to be included multiple times in the response must consist of comma
>> separated lists. So in theory we could keep a simple string[string]
>> list and if we see a header twice we can just merge it with a ','.
>>
>> http://tools.ietf.org/html/rfc2616#section-4.2
>> Relevant part from the RFC:
>> ----------------------
>>    Multiple message-header fields with the same field-name MAY be
>>    present in a message if and only if the entire field-value for that
>>    header field is defined as a comma-separated list [i.e., #(values)].
>>    It MUST be possible to combine the multiple header fields into one
>>    "field-name: field-value" pair, without changing the semantics of
>> the message, by appending each subsequent field-value to the first,
>> each separated by a comma. The order in which header fields with the
>> same field-name are received is therefore significant to the
>>    interpretation of the combined field value, and thus a proxy MUST
>> NOT change the order of these field values when a message is
>> forwarded.
>> ----------------------
>>
>> I'm also done with the first pass through the http parsers.
>> Documentation is here:
>> http://dl.dropbox.com/u/24218791/std.protocol.http/http/http.html
>>
>> Code here:
>> https://gist.github.com/886612
>> The http.d file is generated from the http.d.rl file.
>>
>
> I added some code to show how I think this could be used in the HTTP
> client:
> https://gist.github.com/886612#file_gistfile1.d
>
> Like in the .net webclient we'd need two of these collections: one for
> received headers and one for headers to be sent.

Thanks!

It would be very nice to have in the std.protocol.http in phobos so that 
the curl stuff could use it. If that happened then 
std.protocol.{smtp,imap,....} could probably also be built on your 
framework and be added as support in the curl wrappers.

/Jonas





More information about the Digitalmars-d mailing list