[phobos] CSVRange: RFC
David Simcha
dsimcha at gmail.com
Sat Jan 29 21:24:03 PST 2011
Jesse,
I was unaware of your efforts. At first glance, your lib looks pretty
good. I definitely think Phobos needs a real CSV parser, as I seem to
write ad-hoc ones all the time. Since your module mostly looks a little
further along and better engineered than mine (mine was really just a
prototype that I spent about half a day on), maybe we should focus on
getting yours up to Phobos quality. The one major feature yours is
missing, though, is the ability for csvText() to extract a subset of the
available columns by header. I also like the idea of doing things by
column header instead of hard coding the column order because it's less
brittle if the layout changes.
--David Simcha
On 1/29/2011 10:47 PM, Jesse Phillips wrote:
> That is about the same as what I have, though I was attempting to
> handle custom delimiters for fields, records, and quote.
>
> https://github.com/he-the-great/JPDLibs/tree/csv
>
> But about your code. I was getting a Range Violation with your
> unittests active. Also you don't handle a quoted empty field
> correctly. Otherwise you pass the unittest I ported from mine:
>
> https://gist.github.com/802502
>
> On Sat, Jan 29, 2011 at 3:44 PM, David Simcha<dsimcha at gmail.com> wrote:
>> I've written a small module for reading CSV and similar delimited files.
>> I've been meaning to do this for a while. Basically, it allows reading a
>> CSV file with O(1) memory usage (i.e. it can be parsed one character at a
>> time) to a range of ranges of cells. Quotes, escaped quotes, etc. are
>> handled properly. I tested it on a nasty CSV file produced by Affymetrix,
>> and it works rather well.
>>
>> CSVRange also allows for iteration over rows as a range of structs. For
>> example, let's say you had a file:
>>
>> Height,Weight,Shoe Size
>> 6.5,210,13
>> ...
>>
>> You could read this file lazily into a range of structs with something like:
>>
>> struct Person
>> {
>> float height;
>> uint weight;
>> uint shoeSize;
>> }
>>
>> auto csvRange = csvFile(someCharacterRange, ',');
>> auto structs = csvStructRange(csvRange, ["Height", "Weight", "Shoe Size"]);
>>
>> // Iterate lazily through the rows.
>> foreach(s; structs) {
>> // Do stuff.
>> }
>>
>> Note that this still works even if you have tons of columns you don't care
>> about in the file.
>>
>> Code:
>>
>> http://dsource.org/projects/scrapple/browser/trunk/csvRange/csvRange.d
>>
>> Docs:
>>
>> http://cis.jhu.edu/~dsimcha/csvRange.html
>>
>>
>> _______________________________________________
>> phobos mailing list
>> phobos at puremagic.com
>> http://lists.puremagic.com/mailman/listinfo/phobos
>>
>
>
More information about the phobos
mailing list