Speed of csvReader

cym13 via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Thu Jan 21 15:03:23 PST 2016


On Thursday, 21 January 2016 at 21:24:49 UTC, H. S. Teoh wrote:
> [...]

It may be fast but I think it may be related to the fact that 
this is not a CSV parser. Don't get me wrong, it is able to parse 
a format defined by delimiters but true CSV is one hell of a 
beast. Of course most data look like:

     number,name,price,comment
     1,Twilight,150,good friend
     2,Fluttershy,142,gentle
     3,Pinkie Pie,169,oh my gosh

but you can have delimiters inside a field:

     number,name,price,comment
     1,Twilight,150,good friend
     2,Fluttershy,"14,2",gentle
     3,Pinkie Pie,169,oh my gosh

or quotes in a quoted field, in that case you have to double the 
quotes:

     number,name,price,comment
     1,Twilight,150,good friend
     2,Fluttershy,142,gentle
     3,Pinkie Pie,169,"He said ""oh my gosh"""

but in that case external quotes aren't required:

     number,name,price,comment
     1,Twilight,150,good friend
     2,Fluttershy,142,gentle
     3,Pinkie Pie,169,He said ""oh my gosh""

but at least it's always one record per line, no? No? No.

     number,name,price,comment
     1,Twilight,150,good friend
     2,Fluttershy,142,gentle
     3,Pinkie Pie,169,"He said
     ""oh my gosh""
     And she replied
     ""Come on! Have fun!"""

I'll stop there, but you get the picture. Simply splitting by 
line then separator may work well on most data, but I wouldn't 
put it in production or in the standard library. Note that I 
think you did a great job optimizing your code, and I respect 
that, it's just a friendly reminder.



More information about the Digitalmars-d-learn mailing list