[RFC] CSV parser

Jesse Phillips jessekphillips+d at gmail.com
Mon Apr 4 22:44:34 PDT 2011


I have implemented an input range based CSV parser that works on text 
input[1]. I combined my original implementation with some details of 
David's implementation[2]. It is not ready for formal review as I need to 
update and polish documentation and probably consolidate unit tests.

It provides a very simple interface which can either iterate over all 
elements individually or each record can be stored in a struct. The unit 
tests and examples[3] do a good job showing the interface, but here is 
just one (taken from unit test) using a struct and header:

    string str = "a,b,c\nHello,65,63.63\nWorld,123,3673.562";
    struct Layout
    {
        int value;
        double other;
        string name;
    }

    auto records = csvText!Layout(str, ["b","c","a"]);

    Layout ans[2];
    ans[0].name = "Hello";
    ans[0].value = 65;
    ans[0].other = 63.63;
    ans[1].name = "World";
    ans[1].value = 123;
    ans[1].other = 3673.562;

    int count;
    foreach (record; records)
    {
        assert(ans[count].name == record.name);
        assert(ans[count].value == record.value);
        assert(ans[count].other == record.other);
        count++;
    }
    assert(count == 2);

The main implementation is in the function csvNextToken. I'm thinking it 
might be useful to have this function public as it will allow for writing 
a parser for or recovering from malformed data.

In order to be memory efficient appender is reused for each iteration. 
However the default behavior does result in a copying being taken. To 
prevent the copy being made just provide the type as char[]

    string str = `one,two,"three ""quoted""","",` ~ "\"five\nnew line
\"\nsix";
    auto records = csvText!(char[])(str);
    
    foreach(record; records)
    {
        foreach(cell; record)
        {
            writeln(cell);
        }
    }

If your structure stores char[] instead of string you will also observe 
the overwriting behavior, should this be fixed?.

So feel free to suggest names, implementation correction, or 
documentation. Or giving a thumbs up. The more interest, the more 
interest I'll have in getting this done sooner :)

1. https://github.com/he-the-great/JPDLibs/blob/csv/csv/csv.d
2. http://lists.puremagic.com/pipermail/phobos/2011-January/004300.html
3. https://github.com/he-the-great/JPDLibs/tree/csv/examples


More information about the Digitalmars-d mailing list