[phobos] CSVRange: RFC

David Simcha dsimcha at gmail.com
Sat Jan 29 21:24:03 PST 2011


Jesse,

I was unaware of your efforts.  At first glance, your lib looks pretty 
good.  I definitely think Phobos needs a real CSV parser, as I seem to 
write ad-hoc ones all the time.  Since your module mostly looks a little 
further along and better engineered than mine (mine was really just a 
prototype that I spent about half a day on), maybe we should focus on 
getting yours up to Phobos quality.  The one major feature yours is 
missing, though, is the ability for csvText() to extract a subset of the 
available columns by header.  I also like the idea of doing things by 
column header instead of hard coding the column order because it's less 
brittle if the layout changes.

--David Simcha

On 1/29/2011 10:47 PM, Jesse Phillips wrote:
> That is about the same as what I have, though I was attempting to
> handle custom delimiters for fields, records, and quote.
>
> https://github.com/he-the-great/JPDLibs/tree/csv
>
> But about your code. I was getting a Range Violation with your
> unittests active. Also you don't handle a quoted empty field
> correctly. Otherwise you pass the unittest I ported from mine:
>
> https://gist.github.com/802502
>
> On Sat, Jan 29, 2011 at 3:44 PM, David Simcha<dsimcha at gmail.com>  wrote:
>> I've written a small module for reading CSV and similar delimited files.
>>   I've been meaning to do this for a while.  Basically, it allows reading a
>> CSV file with O(1) memory usage (i.e. it can be parsed one character at a
>> time) to a range of ranges of cells.  Quotes, escaped quotes, etc. are
>> handled properly.  I tested it on a nasty CSV file produced by Affymetrix,
>> and it works rather well.
>>
>> CSVRange also allows for iteration over rows as a range of structs.  For
>> example, let's say you had a file:
>>
>> Height,Weight,Shoe Size
>> 6.5,210,13
>> ...
>>
>> You could read this file lazily into a range of structs with something like:
>>
>> struct Person
>> {
>>     float height;
>>     uint weight;
>>     uint shoeSize;
>> }
>>
>> auto csvRange = csvFile(someCharacterRange, ',');
>> auto structs = csvStructRange(csvRange, ["Height", "Weight", "Shoe Size"]);
>>
>> // Iterate lazily through the rows.
>> foreach(s; structs) {
>>     // Do stuff.
>> }
>>
>> Note that this still works even if you have tons of columns you don't care
>> about in the file.
>>
>> Code:
>>
>> http://dsource.org/projects/scrapple/browser/trunk/csvRange/csvRange.d
>>
>> Docs:
>>
>> http://cis.jhu.edu/~dsimcha/csvRange.html
>>
>>
>> _______________________________________________
>> phobos mailing list
>> phobos at puremagic.com
>> http://lists.puremagic.com/mailman/listinfo/phobos
>>
>
>



More information about the phobos mailing list