Second Round CURL Wrapper Review

Sat Dec 17 17:17:17 PST 2011

On Saturday, December 17, 2011 12:56:02 jdrewsen wrote:
> On Thursday, 15 December 2011 at 07:46:56 UTC, Jonathan M Davis
> 
> > I'd argue that the acronyms should be in all caps if
> > camelcasing would require that the first letter of the acronym
> > be capitalized and all lower case if camelcasing would require
> > that the first letter of the acronym be lower case. We're not
> > completely consistent in how we using acronyms in names in
> > Phobos, but I believe that we primarily follow that rule when
> > putting them in symbol names. So, for instance, it would be
> > HTTP, FTP, and SMTP rather than Http, Ftp, and Smtp.
> 
> I like Http better as personal taste. But if HTTP is preferred
> then I'll do that. Actually I think I will make a pull request to
> extend the dlang.org/dstyle.html doc with and example that we can
> point to when someone asks about styling. I've spend too much
> time restyling because there is no such style doc already and
> people are complaining about style in reviews anyway.

I actually have such a pull request, but it's been languishing because there 
are some items in it that need to be discussed. I keep forgetting to bring 
them up in the Phobos news group so that they can be decided and we can move 
on with that.

https://github.com/D-Programming-Language/d-programming-language.org/pull/16

> > I'd rename T to C or Char in the template parameters, since
> > it's a character type rather than a generic type. Not a huge
> > deal, but I think that it makes the code quicker to understand.
> 
> I've named it T because it can also be ubyte. In that case C for
> Char is confusing.

Yeah. I got that after reading the source, but the needs to be clearer in the 
documentation. Normally, when something is templated the way that that is, 
it's going to be templated on character type. And something which can be 
either ubyte or char but nothing else is rather abnormal. It needs to be 
clearer. And it's possible that at some point, it really should be changed to 
be any character type and ubyte rather than just char or ubyte (depending on 
how that affect efficiency - basically if it's more efficient to make it work with 
dchar from the get-go rather then get a string and convert it, then it's more 
desirable to make it work with wchar and dchar, but if you're just going to 
have to duplicate it or convert it anyway, then you might as well just 
restrict it to char and ubyte).

> > Why are some parameters const(char)[] and others string? Why
> > not make them all const(char)[] or even actually templatize
> > them on character type (one per parameter which is a string)?
> 
> The parameters that are const(char)[] will accept both string and
> char[] which is what I want because libcurl make an internal copy
> of it anyway. If I typed it as string then you would have to idup
> a char[] for the parameter without any reason.
> The parameters that are string is internally passed to functions
> (in other std modules) that only accept strings. I could accept
> const(char)[] and idup it in order to have consistant parameter
> types but that gives an unnecessary overhead.

Okay. In general, functions should either take string or take a range of 
dchar. Taking a range of dchar is the most flexible, so it's generally the best 
(though this often results in specializations within the template for narrow 
strings - std.algorithm does that a lot). If you need a string spefically (e.g. 
all of the std.string functions specifically operate on strings), then taking 
an array that's templated on character type is generally better, because it's 
more flexible.

If you need a string internally, then do _not_ use const(char)[]. or const(C)
[]. When you do that, you're forced to idup the string (or to!string is forced 
to idup it). It's far better to either take string (in which case the caller 
will idup if it's necessary) or to not use const. That way, the string is 
copied only when it actually needs to be copied.

That does sometimes result in functions which take string for some parameters 
and const(char)[] for others, which looks odd to the programmer calling it, 
but that's life I guess. It's less of an issue when ranges are used, however, 
because then they're all ranges and it's just how theyre treated internally, I 
guess.

I don't know how well you've done with this in general without looking over 
the code again. I know that you've generally taken const(char)[] instead of 
string, and sometimes you've iduped that. And those parameters needs to be 
fixed to take string and use to!string so that the iduping isn't always 
necessary. Also, since in most cases, you're ultimately passing the strings to 
the C curl API, the benefit of operating on ranges of dchar instead of strings 
is debatable, so it's not necessarily a problem that you're not taking ranges 
of dchar. You do, however, need to make sure that you use string rather than 
const(char)[] in places where you're iduping so that we can avoid such 
unnecessary allocations.

> > Shorten the URL in your examples. Its long length increases the
> > risk of making the examples too wide. It's a made up URL, so
> > there's no need for it to be that long. Something like
> > foo.bar.com is probably more than enough.
> 
> Actually the long urls point to a google appspot app that fits
> the example. I've done it this way so that people can just
> copy/paste the example and have something working. Not too many
> people have a test HTTP POST server setup for a quick test AFAIK.

I thought that it was fake. Well, if you can, use a shorter URL. Like dlang 
instead of d-programming-language now that we havethat domain.

> > Why does byLine use '\x0a' instead of '\n'? '\n' is clear,
> > whereas most people will have to look up what '\x0a' is, so I
> > really think that it should be '\n'.
> 
> I looked at how it was done elsewhere in phobos and did like that:
> http://dlang.org/phobos/std_stdio.html#byLine
> 
> But I agree with you that \n is better and will change it.

std.stdio should be changed then IMHO.

> > In general, the functions should do a better job of documenting
> > their parameters. Many of them don't document their parameters
> > at all. For instance, while it's fairly easy to guess what
> > keepTerminator and terminator are used for, there is no
> > explanation about them whatsoever in either byLine or
> > byLineAsync's documentation.
> 
> Again I looked at phobos to see what was current the style:
> http://dlang.org/phobos/std_stdio.html#byLine
> 
> I'll include some more detail though.

Phobos' documentation can definitely be improved in some cases, and I think 
that this is one of them.

> > If any of the range return types of these functions have
> > non-standard functions on them, they need to be properly
> > documented. e.g. wait(Duration) is mentioned in passing in
> > byLineAsync's documentation, but no explanation for its purpose
> > is given. The documentation needs to explain everything the
> > programmer needs to know about the return types. For most
> > ranges, it's enough to say that the return type is a range of
> > X, but if the return type has anything other than the standard
> > range functions, then just saying that the function returns a
> > range of X is not enough. The extra functions must be listed
> > and explained.
> 
> This is what is currently documented: "If no data is available
> and the main thread access the range it will block
> until data becomes available. An exception to this is the $(D
> wait(Duration))
> method on the range. This method will wait at maximum for the
> specified
> duration and return true if data is available."
> 
> I guess that explains it?

I really think that you need to state specifically that the returned range has 
an the additional wait(Duration) function on it and explain directly what it's 
for and how it's used. Also, put it in the example. The way it states it by 
saying "An exception to this is..." makes the fact that wait(Duration) is 
mentioned at all seem like an afterthought, and while it does give some 
explanation for it, I really don't think that it's clear enough how it's used 
and what it does. It's enough to start to get the idea, but not enough to 
actually use it. An example would probably be the biggest help in that regard, 
but I do think that the function needs to be explicitly documented with a 
description of what it does rather than seemingly mentioning it as an 
afterthought - especially since it also raises the question of what other non-
standard functions the range has that aren't mentioned.

> > I'd warn you that the fact that the documentation for Protocol
> > is showing up in the documentation is likely a bug, since
> > Protocol is private - probably the bug that all templates are
> > currently public even if they're marked private. So, either
> > Protocol needs to be public (or maybe protected), or it's going
> > to disappear from the documentation eventually. Another option
> > would be to do something similar to
> > std.container.TotalContainer and declare a struct which is
> > intended only for documentation (probably in a version(StdDdoc)
> > block), and put all of the protocol documenation on _it_
> > instead of the mixin.
> 
> Ok. didn't know about the private template bug. I really think it
> is a bug in ddoc that it should be able to insert mixed in docs
> ie. the Protocol documentation should be mixed into the Http
> documentation.
> 
> Anyway - I don't see the TotalContainer docs showing in the
> generated html?
> 
> Another workaround is to manually copy the Protocol docs into the
> HTTP/FTP/SMTP structs inside a version(StdDdoc).

We have at least 2 bugs here. One, that templates are always public, and two, 
that the documentation isn't mixed in. As for TotalContainer, you appear to be 
right. I'm in such a habit of reading the code rather than the docs, that I 
missed that. It's using /* rather than /**, so it doesn't end up in the ddoc.

The best would be if the functions were on their approriate types with StdDoc, 
but at present, that means duplicating all of that documentation, which would 
kind of suck. So, it's up to you. From the perspective of those reading the 
documentation, it would definitely be better if it were on the appropriate 
types though, so I would be inclined to argue that they should go on the types 
and that we can remove them when the mixin bug is fixed, but I don't know what 
others would think about that. At minimum, it needs to be more clearly 
explained that Protocol is a template mixin that is being mixed into Http, 
Ftp, and Smtp, and those have all of Protocol's functions.

- Jonathan M Davis