Review of Jesse Phillips's CSV Parser

Fri Nov 11 21:21:55 PST 2011

On Thu, 10 Nov 2011 22:17:10 -0500, Jesse Phillips <jessekphillips+d at gmail.com> wrote:
> On Mon, 07 Nov 2011 23:28:20 -0500, Robert Jacques wrote:
>> On Sun, 06 Nov 2011 19:00:42 -0500, Jesse Phillips
>> <jessekphillips+d at gmail.com> wrote:
[snip]
>> I also have some API design concerns. Primarily, most of the type names
>> are overly broad and unclear. For example: Record, Records, Malformed
>> could just as easily be part of a database API. And because of this, not
>> only does one reading the code not know what the object is, but if
>> namespace conflicts can occur, then code reviewers will probably make
>> the wrong assumption half the time as to what Record is and not bother
>> looking it up.
>
> The reason it is name like what you would find in a database, is because
> they are the same thing. It is a row of data. In past discussion about
> existing name clashes it was concluded that was what modules, static, and
> named imports wore for.
>
> Malformed, I just don't have any clue what to actually call it. No
> CSVMalformed doesn't help.
>
> But then you go on to the point below, that you shouldn't even see them.
>
>> Furthermore, Records and Record, are essentially internal ranges/structs
>> returned by csvReader. In the tradition of std.algorithms these should
>> be defined inside the csvReader function (preferably) or as CsvReader /
>> CsvReader.Record.
>
> I was going to explain how wrong you are and how I tried having all the
> options part of csvReader but it just muddies up the clean API when
> customizing something. But actually I already found the solution with my
> previous updates and incorrectly chose to move Malformed customization
> out.
>
> Long story short the problems I see:
>
> * The documentation still needs to exist and will be a little harder to
> do without these
> * Records contains the heading field, this is abnormal for an InputRange
> and means it won't show as an auto link if it becomes a hidden range.
> * Currently you can find exactly which function call result in a specific
> error, if I hide the ranges you don't.
>
> I'm expecting that introducing CSVFormatter will require another review,
> at that time I can try making two forms of documentation to see which is
> preferred.
>
> Thank you! Very useful food for thought.

Modules, static and named imports are extremely useful tools for making disparate third party libraries work together. But Phobos is one (big) library, not a bunch of separate ones. We shouldn't ever need to use these tools to resolve Phobos name conflicts. In fact one of the (many) goals of D is to scale down to quick and dirty scripts. rdmd helps with this a lot, but the other piece of this puzzle is 'import std.all;' which doesn't work today. And while I don't think we should break existing code just to fix this problem, going forward, as we update and add libraries to Phobos, we need to strive to prevent name collisions. And, like std.file and std.stdio, I'd expect std.csv and std.databases to be used together often. Furthermore, your primary case for exposing these structs is for documentation purposes. As the user isn't expected to actually instantiate these structs (or at least not often), naming them CsvReader and CsvReader.Row is perfectly acceptable. Lastly, I'd like to  
draw your attention to std.algorithm.findSplit. findSplit has to return three ranges, which it does so using a Tuple. Records is simply a manual tuple of the headings and the row ranges, so why not leverage Phobos for this instead?