etc.curl: Formal review begin

Jonas Drewsen jdrewsen at nospam.com
Thu Aug 25 03:07:23 PDT 2011


On 24/08/11 14.58, Johannes Pfau wrote:
> David Nadlinger wrote:
>> Now that Lars Kyllingstad's new and improved std.path has passed the
>> vote  – congratulations, Lars! –, and Jose Armando Garcia, the author
>> of the proposed logging module, is currently not available, the
>> etc.curl module by Jonas Drewsen is at the front of the review queue.
>> I have volunteered to run the formal process for inclusion with Phobos.
>>
>> etc.curl provides a high-level interface to client-side networking
>> functionality for the HTTP, FTP and SMTP protocols by wrapping libcurl
>> [1], accessing the C interface of which is already possible using
>> Phobos in the form of etc.c.curl. Prior stages of the work have
>> already been discussed in March [2], May [3], and June [4]. Thanks to
>> everybody who took the time to evaluate the library back then; as far
>> as I am aware, all raised concerns have been addressed since then.
>>
>> Based on the amount of discussion this module already underwent, I
>> hereby propose a two-week review period. Barring any objections, the
>> review starts today (Aug 18) and ends on Aug 31, followed by a one
>> week vote (Sep 1 - Sep 7).
>>
>> Code:
>>    https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d
>>
>> API documentation:
>>    http://freeze.steamwinter.com/D/web/phobos/etc_curl.html
>>
>> A discussion on the wrapper has already been started on digitalmars.D
>> [5]. To keep everything in one place, please post your comments and
>> reviews there.
>>
>> At this point, I would like to invite everyone to spend some time
>> testing the module and reading its documentation and code. It is
>> essential for a working open source code review process that many
>> people participate in it, and regardless of whether you are new to D
>> or a seasoned contributor, your opinion is very valuable.
>>
>> Jonas also asked us to have a look at the list of known issues (which
>> can also be found at the top of the source file), and, if possible,
>> suggest a solution:
>>
>> ---
>> Known issues:
>>   * DDoc is not generated where the mixins ByLineAsync, ByLineSync,
>> ByChunkAsync and ByLineSync are used. This seems to be a limitation of
>> ddoc - suggestions on how to circumvent this appreciated.
>>
>> Possible improvements:
>>   * Progress may be deprecated in the future. Maybe implement a
>> replacement.
>>   * Support typed http headers - (Johannes Pfau)
>> ---
>>
>> To make sure it doesn't get lost, let me also repeat Johannes Pfau's
>> request to add proxy support from another thread ([6]) here.
>>
>> Thanks a lot for your reviewing efforts,
>> David
>>
>>
>>
>> [1] http://curl.haxx.se/libcurl/
>> [2]
>> http://www.digitalmars.com/d/archives/digitalmars/D/Curl_support_RFC_131753.html
>> [3]
>> http://www.digitalmars.com/d/archives/digitalmars/D/Curl_wrapper_136372.html
>> [4]
>> http://www.digitalmars.com/d/archives/digitalmars/D/Curl_wrapper_round_two_138945.html
>> [5]
>> http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=142661
>> [6]
>> http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=142175
>
>
> +++++++++++++++++++++++++++   Code Review    +++++++++++++++++++++++++
>
> Comments and examples:
>      /**
>         Example:
>         ----
> curl.onReceive = (ubyte[] data) { writeln("Got data", cast(char[])
>         data); return data.length;};
>         ----
>      */
> Maybe rewrite comments like this:
>      /**
>       * Example:
>       * ----
>       * curl.onReceive = (ubyte[] data) { writeln("Got data",
>       * cast(char[]) data); return data.length;};
>       * ----
>       */
> This way, the examples don't break the indentation. I think this
> "misaligned" example blocks make the source harder to read.
>
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L152
> bu -->  by; thread -->  threads
>
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L148
> What does the phobos style guide say for alias names? I'd argue
> outdata/indata is a type (or at least, looks like one), so -->
> OutData/InData ?
>
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L198
> Do not call this from... -->  Warning: Do not call this from...
>
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L289
> CurlOption.file -->  CurlOption.writeData ?
>
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L326
> Content of the delegate in the example should be idented.
>
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L345
> CurlOption.infile -->  CurlOption.readData ?
>
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L362
> indent
>
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L414
> indent
>
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L420
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L693
> dltotal, dlnow, ultotal, ulnow -->  dlTotal, dlNow, ulTotal, ulNow ?
>
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L432
> The C callbacks take char* pointers to the data buffers. As these
> could contain binary data, shouldn't ubyte* be used here? I know that
> the data is later cast to ubyte[], but I'm a little concerned about
> this pointer slicing: str[0..size*nmemb] . This creates a temporary
> char[] which could be invalid. Maybe not a big deal, but I think we
> should avoid that.
>
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L540
> It's a little strange that such properties are write-only, but I know
> that's caused by a curl limitation, so nothing we can do about that.
>
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L576
> CurlOption.intrface -->  CurlOption.interface, likely also wrong in
> etc.c.curl
>
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L595
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L584
> int port -->  ushort port? I see no need for negative values?
>
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L645
> indent
>
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L662
> aften -->  after
>
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L739
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L779
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L817
> shouldn't those be documented, so that the user knows which methods can
> be used on Http.Result/AsyncResult?
>
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1176
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2464
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2899
> Maybe some mixin documentation should be added manually, something like
> "Mixes in Protocol"
>
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1431
> I think I've seen REST APIs which use POST without data or with
> parameters embedded in the URL. Is there no simple way to tell curl to
> _not add_ any data and _not set_ the Content-Type header? But I'm not
> even sure if a POST without a BODY is valid in HTTP.
>
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1432
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1500
> Shouldn't delim be "&"?
> Also in the foreach loop, delim should be ignored for the first pair?
>        foreach (i, key; params.byKey()) {
>              if(i == 0)
>                  data ~= key ~ "=" ~ params[key];
>              else
>                  data ~= delim ~ key ~ "=" ~ params[key];
>          }
>
> Also, is there a reason why the user must escape the parameters
> manually? As the user has no Curl handle, that could be annoying.
>
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1537
>        const(char)[] k = "Content-Type";
>        res.header(k, contentType);
> -->  res.header("Content-Type", contentType); ?
>
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1658
> if ( _true_ ||!netAllowed) return;  ?
>
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1770
> Add "; charset=utf-8" ?
>
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1789
> IIRC it's possible to add parameter names to delegate types in D
> to document their use, so
> void delegate(const(char)[],const(char)[]) callback)
> -->  void delegate(const(char)[] key,const(char)[] value) callback)
>
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1798
> if (header[0..5] == "HTTP/") {
> -->
> if (header.length>= 5&&  header[0..5] == "HTTP/") {
>
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1829
> if -->  of
>
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1959
> add a toString() method for debugging purposes?
>
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2221
> document which functions call execute() ?
>
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2283
> why private?
> (*v) ~= value; -->  (*v) ~= "," ~ value; ?
>
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2327
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2308
> Document that this still transfers all data into memory first.
>
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2643
> Document that Result mixes in ProtocolRequestParamsSetters and
> RequestParamsSetters
>
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2652
> Add this bug and all other similar bugs to a ddoc BUG: section in the
> module comment, so this doesn't get lost.
>
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2932
> https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2943
> Bounds checking: At least a assert((recipient|sender).length>  0);
> should be added.
>
> (I've skipped most of the async stuff, maybe someone else should have a
> look at that.)
>
> ++++++++++++++++++++++   API docs Review   ++++++++++++++++++++++
>
> First, huge example:
> Split this into many smaller code blocks, add some high-level overview,
> here's some inspiration in pseudo-ddoc, feel free to use this:
>
> etc.curl provides wrappers for $(LINK curls) HTTP, FTP and SMTP
> functions.
>
> There are convenience functions which can be used to make simple
> requests with a single line of code:
> ----------------------------------------
> // Simple HTTP GET
> auto result = Http.get("http://www.google.com").toString();
> ----------------------------------------
> This makes a HTTP GET request to "http://www.google.com". These
> convenience functions are always _static_ functions of the HTTP /
> FTP / SMTP structs. They return a Result/AsyncResult struct which can
> be used to set options of the request and get the result as a string or
> ubyte[] array.
> ----------------------------------------
> // Simple HTTP GET with timeout
> auto request = Http.get("http://www.google.com");
> request.connectTimeout(dur!"seconds"(10));
> auto result = request.toString();
> ----------------------------------------
> As connectTimeout and similar methods return the Result/AsyncResult
> instance, it's possible to chain those calls into a single line:
> ----------------------------------------
> auto result =
> Http.get("http://www.google.com").connectTimeout(dur!"seconds"(10)).toString();
> ----------------------------------------
> Note that the request to the remote webserver is done when you call
> toString() or similar methods that internally call execute (see $(LINK
> )). If you don't call toString() or a similar method, no request is
> made!
>
> etc.curl additionally provides asynchronous versions of these
> convenience functions: Http.getAsync() is the asynchronous version that
> will spawn a thread in the background and return a Http.AsyncResult
> immediately. You can read data from the result at later point in time.
> This allows you to start processing data before all data has been
> received by using byChunk() or byLine() on the Http.AsyncResult
> instance.
> ----------------------------------------
> // GET using an asynchronous range
> foreach (line; Http.getAsync("http://www.google.com").byLine())
> {
>      // Do some expensive processing on the line while more lines are
>      // received asynchronously in the background.
>      writeln("asyncLine: ", line);
> }
> ----------------------------------------
>
> If you need more control than the convenience functions provide, you'll
> have to use the full API:
> ----------------------------------------
> // GET with custom data receivers
> Http http = Http("http://www.google.com");
> http.onReceiveHeader =
>      (const(char)[] key, const(char)[] value) { writeln(key ~ ": " ~
> value); };
> http.onReceive = (ubyte[] data) { /+ drop +/ return
> data.length; };
> http.perform();
> ----------------------------------------
> First, we create an instance of the reference-counted Http struct. We
> then set custom delegates which are called whenever the Http instance
> receives a header or a data buffer. In this simple example, we simply
> print the headers and ignore the data. See $(LINK
> onReceiveHeader/onReceive) for more information.
> We finally start the HTTP request by calling perform(). Note that you
> cannot combine the convenience get/post/put/... functions with this
> extended API: Although you could call http.get("http://some.url")
> instead of perform, this would completely ignore the settings made to
> the Http instance! Remember that get/post/put/etc are _static_ methods.
> They cannot access this instances data.
>
> If you want to make a different request than GET with the extended API,
> you have to set the method property:
> ----------------------------------------
> // PUT with data senders
> string msg = "Hello world";
> http.onSend = (void[] data)
> {
>      if (msg.empty)
>          return 0;
>
>      auto m = cast(void[])msg;
>      size_t length = m.length;
>
>      //We assume msg fits into the data buffer, real code shouldn't do
>      //that
>      assert(data.length>  msg.length);
>      data[0..length] = m[0..$];
>      return length;
> };
> http.method = Http.Method.put;
> http.contentLength = 11; // defaults to chunked transfer if not
> specified
> http.perform();
> ----------------------------------------
> If you data doesn't fit into the buffer provided by onSend, onSend will
> be called multiple times. See $(LINK) for more information.
>
> The extended API has also functions to help you track the transfer's
> progress:
> ----------------------------------------
> // Track progress
> http.method = Http.Method.get;
> http.url = "http://upload.wikimedia.org/wikipedia/commons/"
>             "5/53/Wikipedia-logo-en-big.png";
> http.onReceive = (ubyte[] data) { return data.length; }; //Discard data
> http.onProgress = (double dltotal, double dlnow,
>                     double ultotal, double ulnow) {
>      writeln("Progress ", dltotal, ", ", dlnow, ", ", ultotal, ", ",
> ulnow); return 0;
> };
> http.perform();
> ----------------------------------------
>
> As a last example, here's how to send an email with etc.curl:
> ----------------------------------------
> // Send an email with SMTPS
> Smtp smtp = Smtp("smtps://smtp.gmail.com");
> smtp.setAuthentication("from.addr at gmail.com", "password");
> smtp.mailTo = ["<to.addr at gmail.com>"];
> smtp.mailFrom = "<from.addr at gmail.com>";
> smtp.message = "Example Message";
> smtp.perform();
> ----------------------------------------
>
> Some complete examples like these above could also be added to the
> Http/Ftp/Smtp struct documentation. You could use the examples I used
> in a reply to Walter, if you wanted to.
>
> I'd also say, at least in the examples, use braces on their own lines.
> And I'd insert a few empty lines in certain places, for example:
> ----------------------------------------
> http.onSend = (void[] data) {
>      if (msg.empty) return 0;
>      auto m = cast(void[])msg;
>      size_t length = m.length;
>      data[0..length] = m[0..$];
>      msg.length = 0;
>      return length;
> };
> ----------------------------------------
>
> doesn't look very good, I'd write it like this:
> ----------------------------------------
> http.onSend = (void[] data)
> {
>      if (msg.empty)
>          return 0;
>
>      auto m = cast(void[])msg;
>      size_t length = m.length;
>
>      //We assume msg fits into the data buffer, real code shouldn't do
>      //that
>      assert(data.length>  msg.length);
>      data[0..length] = m[0..$];
>      return length;
> };
> ----------------------------------------
>
> ++++++++++++++++++++++   Feature request   ++++++++++++++++++++++
>
> Although David already quoted my request in the initial post (thx
> David), I'll repeat it here:
>
> Could you add proxy support to the
> wrapper? All of CURLOPT_PROXY CURLOPT_PROXYPORT and CURLOPT_PROXYTYPE
> should be available. (The 'combined' CURLOPT_PROXY since 7.21.7 is too
> new, that's not even in the current ubuntu)
>

Thank you. Will chew through it slowly.

/Jonas






More information about the Digitalmars-d mailing list