Review of Jesse Phillips's CSV Parser

Jesse Phillips jessekphillips+d at gmail.com
Tue Nov 15 19:30:51 PST 2011


On Sat, 12 Nov 2011 00:21:55 -0500, Robert Jacques wrote:

> On Thu, 10 Nov 2011 22:17:10 -0500, Jesse Phillips
> <jessekphillips+d at gmail.com> wrote:
>> On Mon, 07 Nov 2011 23:28:20 -0500, Robert Jacques wrote:
>>> On Sun, 06 Nov 2011 19:00:42 -0500, Jesse Phillips
>>> <jessekphillips+d at gmail.com> wrote:
> [snip]
>>> I also have some API design concerns. Primarily, most of the type
>>> names are overly broad and unclear. For example: Record, Records,
>>> Malformed could just as easily be part of a database API. And because
>>> of this, not only does one reading the code not know what the object
>>> is, but if namespace conflicts can occur, then code reviewers will
>>> probably make the wrong assumption half the time as to what Record is
>>> and not bother looking it up.
>>
>> The reason it is name like what you would find in a database, is
>> because they are the same thing. It is a row of data. In past
>> discussion about existing name clashes it was concluded that was what
>> modules, static, and named imports wore for.
>>
>> Malformed, I just don't have any clue what to actually call it. No
>> CSVMalformed doesn't help.
>>
>> But then you go on to the point below, that you shouldn't even see
>> them.
>>
>>> Furthermore, Records and Record, are essentially internal
>>> ranges/structs returned by csvReader. In the tradition of
>>> std.algorithms these should be defined inside the csvReader function
>>> (preferably) or as CsvReader / CsvReader.Record.
>>
>> I was going to explain how wrong you are and how I tried having all the
>> options part of csvReader but it just muddies up the clean API when
>> customizing something. But actually I already found the solution with
>> my previous updates and incorrectly chose to move Malformed
>> customization out.
>>
>> Long story short the problems I see:
>>
>> * The documentation still needs to exist and will be a little harder to
>> do without these * Records contains the heading field, this is abnormal
>> for an InputRange and means it won't show as an auto link if it becomes
>> a hidden range.
>> * Currently you can find exactly which function call result in a
>> specific error, if I hide the ranges you don't.
>>
>> I'm expecting that introducing CSVFormatter will require another
>> review, at that time I can try making two forms of documentation to see
>> which is preferred.
>>
>> Thank you! Very useful food for thought.
> 
> Modules, static and named imports are extremely useful tools for making
> disparate third party libraries work together. But Phobos is one (big)
> library, not a bunch of separate ones. We shouldn't ever need to use
> these tools to resolve Phobos name conflicts. In fact one of the (many)
> goals of D is to scale down to quick and dirty scripts. rdmd helps with
> this a lot, but the other piece of this puzzle is 'import std.all;'
> which doesn't work today. And while I don't think we should break
> existing code just to fix this problem, going forward, as we update and
> add libraries to Phobos, we need to strive to prevent name collisions.
> And, like std.file and std.stdio, I'd expect std.csv and std.databases
> to be used together often. Furthermore, your primary case for exposing
> these structs is for documentation purposes. As the user isn't expected
> to actually instantiate these structs (or at least not often), naming
> them CsvReader and CsvReader.Row is perfectly acceptable. 

I like the point on std.all, but isn't std.csv.Rows and std.csv.Row about 
the same as your example names? And if dmd could do partial matches (not 
exactly requesting such a feature) csv.Rows, csv.Row.

> Lastly, I'd
> like to draw your attention to std.algorithm.findSplit. findSplit has to
> return three ranges, which it does so using a Tuple. Records is simply a
> manual tuple of the headings and the row ranges, so why not leverage
> Phobos for this instead?

Records requires access to the header for building the associative array. 
So while I still could return both, the range would still be holding the 
header. If you think it would make an improvement then you'd probably 
want to update your vote to no. Namely what I would see coming from that:


auto records = csvReader(txt);

records.header ...;
foreach(row; records.rows) { ... }


More information about the Digitalmars-d mailing list