etc.curl: Formal review begin

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Mon Aug 29 11:55:33 PDT 2011


On 8/17/11 6:12 PM, David Nadlinger wrote:
> Now that Lars Kyllingstad's new and improved std.path has passed the
> vote – congratulations, Lars! –, and Jose Armando Garcia, the author of
> the proposed logging module, is currently not available, the etc.curl
> module by Jonas Drewsen is at the front of the review queue. I have
> volunteered to run the formal process for inclusion with Phobos.
[snip]

My review of etc.curl follows.

================
Overall
================

The library is comprehensive but does not cater to frequent use cases. 
It could be metaphorically compared with a remote control with 
equally-sized and spaced buttons, as opposed to one with enlarged 
frequent use buttons.

There are two prominent use cases:

1. Given a URL string gimme the content (binary or string). The URL 
could be even "file://".

2. Given a URL as in (1) gimme a method to go through its content one 
chunk or one line at a time. The method should be efficient (i.e. 
support asynchrony transparently such that client code and downloading 
code don't block each other).

I see these cases together covering a vast majority of use cases; (1) 
would be for quick and dirty scripts, (2) would be for applications that 
do casual networking. The only apps that need advanced APIs would be 
those for which networking is a central feature.

Neither of these use cases is covered adequately by the proposed API.

Furthermore, the documentation is also inadequate. Description is 
scarce, cross-references are missing, and examples are poor and scarce. 
A major overhaul is needed for acceptance in Phobos.

Finally, I found a couple of egregious bugs related to buffer overflow, 
both in the examples and the implementation. There should be significant 
scrutiny of such. Ideally we'd frame these issues as an API design 
problem that we may be able to design around.

My vote is contingent upon fixing these large issues. Below are more 
details along with a variety of smaller comments.

================
Documentation
================

* Should be std.curl.

* "Curl client functionality as provided by libcurl." -> "Networking 
client functionality as provided by libcurl."

* Code inlined within text should use $(D ...).

* The first example is meaningless:

Http.get("http://www.google.com").connectTimeout(dur!"seconds"(10)).toString();

It gets a string but throws it away. At best it's a test that google.com 
is up (it would throw otherwise).

* "Http http = Http("http://www.google.com");" -> "auto http = 
Http("http://www.google.com");" No stuttering please. Also Http's 
constructor should imply "http://" if not present, so "auto http = 
Http("www.google.com");"

* "foreach (line; Http.getAsync("http://www.google.com").byLine())" is 
rather verbose considering it may be the most frequent use pattern. I 
was hoping for e.g. "foreach (line; Http.byLine("google.com"))"

* This comment is inaccurate:

    // Do some expensive processing on the line while more lines are
    // received asynchronously in the background.

The idea of async in here is seldom to do expensive computation, but 
rather to offer incremental processing and output (and sometimes to stop 
early).

* In the "put with data senders" it's unclear what the length of data is 
and what happens if that length is shorter than msg.length. Setting 
contentLength to 11 is awkward. Surely there must be an easier API for 
sending some string up the wire.

* "Smtp smtp = Smtp("smtps://smtp.gmail.com");" -> "auto ..."

* "This documentation should really be in the Http..." -> we need to 
find a solution to this prior to approval. Perhaps version(StdDdoc) may 
be of help.

* Documentation for individual symbol lacks examples. For example, one 
has no idea what requestPause is about, and there's no cross-reference 
sending to the function that uses or returns it. Each symbol or symbol 
group should have examples.

* Documentation is extremely scarce and uses terms without defining 
them. It's reasonable to assume what escape() and unescape() do in such 
a module, but for example cleanup() says "Stop and invalidate this 
instance." which gives no indication of the meaning of invalidation. 
Also, isStopped() checks whether the object is "stopped and invalidated" 
but effecting stopping and invalidation is not done by stop(), but by 
cleanup().

* verbose, dataTimeout etc. don't have corresponding properties for reading.

* netInterface takes a string but not found ubytes (or an IP struct). 
this means someone who has the IP in another format must format it as a 
string first.

* localPort() should take a ushort, not an int.

* Phrases such as "tcp nodelay socket option" should link to relevant 
off-site documentation or a glossary entry.

* onSend example is unindented, uses "l" as a symbol, and is liable to 
buffer overflow, which is, to be brutally honest, appalling.

* The example for onReceive should use to!(const(char)[]) instead of the 
cast, such that the UTF validity is checked.

* Since neither value in onProgress can be fractional, the type should 
be size_t or ulong.

* Due to probably a bug in ddoc, the members of struct Http are not 
indented.

* The informal examples given with addHeader and setHeader should also 
be supported by code examples.

* Example for head() misses a semicolon.

* Documentation for post() makes it easy to overlook the important 
distinction between the two overloads. The documentation should explain 
how one is meant for text upload and the other for binary upload.

* optionsAsync has an extra paren.

* The various HTTP primitives wrapped (del, connect, ...) should include 
links to their definition in the documentation.

* The dox for the two postData() overloads should be merged.

* Until the documentation for perform() comes about towards the end of 
the dox, it's unclear that a lot of work is done by it. So someone who 
wants to understand the library without having first learned curl would 
need to infer this fact from the scarce examples.

* Again, many properties have setters but not getters (e.g. maxRedirects).

* The word "simply" is overused.

================
Implementation
================

* We usually import modules in alphabetical order.

* The frequent pattern if (!expr) throw new CurlException(msg); should 
be replaced by enforce(expr, new CurlException(msg));

* throwOnStopped should accept a defaulted string. There are several 
instances of the pattern if (stopped) throw new CurlException("...");

* I think we can in the future replace curl_easy_(un)escape with more 
efficient native functions.

* This kind of stuff is unsafe: cast(char*)(username ~ ":" ~ password) 
because the result of the concatenation is not necessarily 
zero-terminated. Please make a pass through all cast instances making 
sure the code appends a \0.

* Some lines are long enough to not be visible in git's code viewer.

* More buffer overflow opportunities, e.g. in line 1798: "if 
(header[0..5] == "HTTP/")" should be "if (header.startsWith("HTTP/"))"



Andrei


More information about the Digitalmars-d mailing list