Review of Jesse Phillips's CSV Parser

Thu Nov 10 21:36:58 PST 2011

Thank you, lots of stuff to ponder and comment.

On Thu, 10 Nov 2011 02:10:28 -0800, Jonathan M Davis wrote:

> On Friday, October 28, 2011 09:18:27 dsimcha wrote:
>> Formal review of Jesse Phillips's CSV parser module begins today and
>> runs through November .  Following that, a vote will take place for one
>> week.  Please post any comments about this library in this thread.
>> 
>> Docs:
>> http://nascent.freeshell.org/programming/D/doc/phobos/std_csv.html
>> 
>> Code:
>> https://github.com/he-the-great/phobos/tree/csv
> 
> - Nitpick: You should probably proofread the module documentation again.
> It has several mispellings and grammar errors.

Err, I'll try, use some of the great tips you gave me :)

> - The functions and types mentioned in the documentation should use LREF
> so that they're links to their documentation.

I'll see if I can find them all while proofing.

> - Shouldn't the term "header" be used rather than "heading?" So, you'd
> have HeaderMismatchException instead of HeadingMismatchException.
> Usually, the term heading is only used for stuff like the heading of an
> article. Usually, the term header is used for stuff like files. RFC 4180
> uses the term header but never heading. So, unless I'm missing
> something, I really think that all of the instances of heading (both in
> the documentation and in the API) should be changed to header.

Used both in development, then chose one, guess I got it wrong. Changing.

> - I'd argue against Separator in csvReader being char by default.
> Individual characters should generally be dchar, not char. Using char by
> itself is almost always bad practice IMHO.

The only reason I put defaults on it was because it wouldn't choose the 
value for it from my default values. Otherwise it works with what you 
give it, be it char, wchar, dchar.

> - I'd say that csvReader should say that it returns a Records struct
> which is an input range rather than saying that it provides
> std.range.isInputRange. We just don't talk that way about ranges
> normally. We say that something is a particular range type, not that it
> provides the API for a particular range type. If you want to make "is an
> input range" a link to std.range.isInputRange, that's fine, but what's
> currently there is not how ranges are typically referred to. I'd also
> suggest that you on the range functions where you say that they're part
> of the std.range.isInputRange interface, you should use the term API
> instead of interface, since interface has a specific meaning in the
> language which has nothing to do with this context.

Thanks.

> - Why does csvNextToken take an Appender!(char[])? That seems overly
> restrictive. I'd argue that it should take an output range of dchar
> (which Appender!(char[]) is). Then it's not restricted to Appender, and
> it's not restricted to char[].

Appender is used so that the exception thrown can contain the consumed 
data. Thinking again on this, I suppose I can catch the exception add the 
data and rethrow. Sounds pretty good actually.

> - Also, the fact that you have isSomeChar!(ElementType!Range) concerns
> me somewhat. Normally, I'd expect something more like
> is(ElementType!Range == dchar). As it is, you could have a range of char
> or wchar, which could do funny things. Phobos doesn't normally support
> that, and I'd be concerned about what would happen if someone tried.
> ElementType!Range will be dchar for all string types, so they'll work
> fine either way, but programmers who are overly adventurous or ignorant
> may try and use a char or wchar range, and I'm willing to be that there
> would be problems with that. Just all around, I'd advise checking that
> the ranges are ranges of dchar and using dchar for the separators rather
> than char.

I was of a different mindset of the time. You are probably right I should 
prevent someone from eating their leg off.

> - You should have unit tests testing ranges _other_ than string. The
> fact that neither wstring nor dstring is tested is worrisome, and
> ideally, you'd do stuff like call filter!"true" on a string so that you
> have an input range which isn't a string. Without at least some of that,
> I'd be very worried that your code can't actually handle generic ranges
> of dchar and really only works with string.

Did add a wchar input range test, but now it should just be an dchar 
input range test since wchar ranges won't be allowed.

> - You should have some unit tests with unicode values - preferably some
> which are multiple code units for both char and wchar. For instance,
> std.string uses \U00010143 in some of its tests, because it's 4 code
> units in UTF-8 and 2 in UTF-16. The complete lack of testing with
> unicode is worrisome in the same way that only testing with string is
> worrisome. It's too easy to screw something up such that it happens to
> work with string and ASCII but not in general. So, please add tests
> which involve unicode - not only in the data but in the separators.

Didn't create any new tests, just added unicode in to input and as 
separators into old tests.

> - I find it somewhat odd that you would use a pointer to the range in
> Record rather than using a slice, but I'd have to examine it in more
> detail to see whether that's actually necessary or not. But in general,
> I'd argue that a slice should be used rather than a pointer if it can
> be.

Record requires modifying the range, for a true input range this is not a 
problem, but strings and some other struct based ranges are implicit 
forward ranges and save upon passing.

Slicing doesn't exist on input ranges, though I'd almost argue [] should 
take the place of save() in a forward range.

> - Jonathan M Davis
> 
> 
> P.S.  Just as a note, since you seem to like not using braces if you
> don't need to for if statements and the like, I though that I'd point
> out that try and catch don't require braces in D if htey contain only
> one statement. You have several one line catches which use braces when
> they don't need to. I'm not arguing that you should or shouldn't change
> it, but I thought that I would point out that the braces aren't
> necessary in case you didn't know (since they _are_ necessary in other
> languages).

Thanks, hopefully I'm consistent in that regard and I'll just leave it. I 
knew of try, but not catch.