Review of Jesse Phillips's CSV Parser

Jesse Phillips jessekphillips+d at gmail.com
Thu Nov 10 19:17:10 PST 2011


On Mon, 07 Nov 2011 23:28:20 -0500, Robert Jacques wrote:

> On Sun, 06 Nov 2011 19:00:42 -0500, Jesse Phillips
> <jessekphillips+d at gmail.com> wrote:
>> And only one pending request, csvFormatter. I am still considering how
>> I want to approach this. I don't want to rush it in so I don't plan on
>> having it complete before voting starts.
> 
> My primary use of csv files is as a method of saving data to later graph
> / view / format in Excel. i.e.
> 
> csvWrite("filename",["x","y","z"],x,y,z);
> 
> where x, y, z are ranges with possibly different lengths and/or types.

I don't want to do anything with files, I'm thinking an output buffer 
makes the most sense (should be able to have such a buffer write to a 
file).

The x,y,z seems interesting but what about the poor sole who has a range 
of

struct {
    auto x,y,z;
}

If I'm going to make a csvFormatter than making sure something like

csvFormatter(csvReader(txt));

works is number one priority. But you have added a style to consider when 
building the interface.
 
> I also have some API design concerns. Primarily, most of the type names
> are overly broad and unclear. For example: Record, Records, Malformed
> could just as easily be part of a database API. And because of this, not
> only does one reading the code not know what the object is, but if
> namespace conflicts can occur, then code reviewers will probably make
> the wrong assumption half the time as to what Record is and not bother
> looking it up.

The reason it is name like what you would find in a database, is because 
they are the same thing. It is a row of data. In past discussion about 
existing name clashes it was concluded that was what modules, static, and 
named imports wore for.

Malformed, I just don't have any clue what to actually call it. No 
CSVMalformed doesn't help.

But then you go on to the point below, that you shouldn't even see them.
 
> Furthermore, Records and Record, are essentially internal ranges/structs
> returned by csvReader. In the tradition of std.algorithms these should
> be defined inside the csvReader function (preferably) or as CsvReader /
> CsvReader.Record.

I was going to explain how wrong you are and how I tried having all the 
options part of csvReader but it just muddies up the clean API when 
customizing something. But actually I already found the solution with my 
previous updates and incorrectly chose to move Malformed customization 
out.

Long story short the problems I see:

* The documentation still needs to exist and will be a little harder to 
do without these
* Records contains the heading field, this is abnormal for an InputRange 
and means it won't show as an auto link if it becomes a hidden range.
* Currently you can find exactly which function call result in a specific 
error, if I hide the ranges you don't.

I'm expecting that introducing CSVFormatter will require another review, 
at that time I can try making two forms of documentation to see which is 
preferred.

Thank you! Very useful food for thought.


More information about the Digitalmars-d mailing list