[phobos] CSVRange: RFC

Andrei Alexandrescu andrei at erdani.com
Sun Jan 30 11:11:18 PST 2011


Looks like a good candidate for std.format, but I think it's a ways from 
getting there.

Code review:

#50 RefRange is really ForceInputRange. What do you need it for? It's 
unusual to want to reduce the capabilities of a range.

#78 isCharRange is incorrect. Correct version:

enum bool isCharRange = isInputRange!R && isSomeChar!(ElementType!R);

#100 Why not struct?

#102 private Appender!(char[]) _front;

#305 No comment?

#306 This is CsvRange not a CsvFile as it builds on another range (that 
may or may not be backed up by a file)

#386 No comment?

#387 The name is confusing - it's a class with struct in its name.

#582 We also need a way to read CSV files into string arrays in case the 
user just wants to do the parsing and decide on typing later. Seemingly 
the current design forces choice of type before parsing.

Documentation review:

* No spellchecking (e.g. 'teh')

* Malformatted Wikipedia URL

* No need for copying the license, a URL is sufficient.

* O(1) is a bit inaccurate - memory consumed is proportional to that of 
one element. What you might have meant is that it does not depend on the 
number of lines in a file or on the number of CSV elements in a line.

* A few artifacts have no examples.

* The example should compile. getCharRange() does not exist. FWIW your 
design should work with byLine().

* It's unclear what colHeaders do from the code and the documentation.


Andrei

On 01/29/2011 05:44 PM, David Simcha wrote:
> I've written a small module for reading CSV and similar delimited files.
> I've been meaning to do this for a while. Basically, it allows reading a
> CSV file with O(1) memory usage (i.e. it can be parsed one character at
> a time) to a range of ranges of cells. Quotes, escaped quotes, etc. are
> handled properly. I tested it on a nasty CSV file produced by
> Affymetrix, and it works rather well.
>
> CSVRange also allows for iteration over rows as a range of structs. For
> example, let's say you had a file:
>
> Height,Weight,Shoe Size
> 6.5,210,13
> ...
>
> You could read this file lazily into a range of structs with something
> like:
>
> struct Person
> {
> float height;
> uint weight;
> uint shoeSize;
> }
>
> auto csvRange = csvFile(someCharacterRange, ',');
> auto structs = csvStructRange(csvRange, ["Height", "Weight", "Shoe Size"]);
>
> // Iterate lazily through the rows.
> foreach(s; structs) {
> // Do stuff.
> }
>
> Note that this still works even if you have tons of columns you don't
> care about in the file.
>
> Code:
>
> http://dsource.org/projects/scrapple/browser/trunk/csvRange/csvRange.d
>
> Docs:
>
> http://cis.jhu.edu/~dsimcha/csvRange.html
>
>
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos


More information about the phobos mailing list