Review of Jesse Phillips's CSV Parser

Tue Nov 15 20:12:57 PST 2011

On Tue, 15 Nov 2011 22:30:51 -0500, Jesse Phillips <jessekphillips+d at gmail.com> wrote:
> On Sat, 12 Nov 2011 00:21:55 -0500, Robert Jacques wrote:
>> On Thu, 10 Nov 2011 22:17:10 -0500, Jesse Phillips
>> <jessekphillips+d at gmail.com> wrote:
>>> On Mon, 07 Nov 2011 23:28:20 -0500, Robert Jacques wrote:
>>>> On Sun, 06 Nov 2011 19:00:42 -0500, Jesse Phillips
>>>> <jessekphillips+d at gmail.com> wrote:

[snip]

>> Modules, static and named imports are extremely useful tools for making
>> disparate third party libraries work together. But Phobos is one (big)
>> library, not a bunch of separate ones. We shouldn't ever need to use
>> these tools to resolve Phobos name conflicts. In fact one of the (many)
>> goals of D is to scale down to quick and dirty scripts. rdmd helps with
>> this a lot, but the other piece of this puzzle is 'import std.all;'
>> which doesn't work today. And while I don't think we should break
>> existing code just to fix this problem, going forward, as we update and
>> add libraries to Phobos, we need to strive to prevent name collisions.
>> And, like std.file and std.stdio, I'd expect std.csv and std.databases
>> to be used together often. Furthermore, your primary case for exposing
>> these structs is for documentation purposes. As the user isn't expected
>> to actually instantiate these structs (or at least not often), naming
>> them CsvReader and CsvReader.Row is perfectly acceptable.
>
> I like the point on std.all, but isn't std.csv.Rows and std.csv.Row about
> the same as your example names?

Yes and no. Yes, from the point of view of reading 'std.csv.Rows' in existing code. No, from the point of view writing code. You're much more likely to write just 'Rows' and then have to go through a quick-compile-fix-compile loop. Worse, consider the people using the colliding std.databases.Row struct, which now have to use the fully qualified name. I don't feel that overly succinct names are important to std.csv, as most users will be using 'auto records = csvReader(txt);' and will never have to type out std.csv.Rows or csvRows or whatever.

> And if dmd could do partial matches (not
> exactly requesting such a feature) csv.Rows, csv.Row.
>> Lastly, I'd
>> like to draw your attention to std.algorithm.findSplit. findSplit has to
>> return three ranges, which it does so using a Tuple. Records is simply a
>> manual tuple of the headings and the row ranges, so why not leverage
>> Phobos for this instead?
>
> Records requires access to the header for building the associative array.
> So while I still could return both, the range would still be holding the
> header. If you think it would make an improvement then you'd probably
> want to update your vote to no. Namely what I would see coming from that:
>
>
> auto records = csvReader(txt);
>
> records.header ...;
> foreach(row; records.rows) { ... }

Thank you for explaining. I wasn't seeing the implementation efficiency that comes from the custom struct approach.