Speed of csvReader

H. S. Teoh via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Thu Jan 21 16:26:16 PST 2016


On Thu, Jan 21, 2016 at 11:03:23PM +0000, cym13 via Digitalmars-d-learn wrote:
> On Thursday, 21 January 2016 at 21:24:49 UTC, H. S. Teoh wrote:
> >[...]
> 
> It may be fast but I think it may be related to the fact that this is
> not a CSV parser. Don't get me wrong, it is able to parse a format
> defined by delimiters but true CSV is one hell of a beast.

Alright, I decided to take on the challenge to write a "real" CSV
parser... since it's a bit tedious to keep posting code in the forum,
I've pushed it to github instead:

	https://github.com/quickfur/fastcsv


[...]
> but you can have delimiters inside a field:
> 
>     number,name,price,comment
>     1,Twilight,150,good friend
>     2,Fluttershy,"14,2",gentle
>     3,Pinkie Pie,169,oh my gosh

Fixed.


> or quotes in a quoted field, in that case you have to double the quotes:
> 
>     number,name,price,comment
>     1,Twilight,150,good friend
>     2,Fluttershy,142,gentle
>     3,Pinkie Pie,169,"He said ""oh my gosh"""

Fixed.  Well, except the fact that I don't actually interpret the
doubled quotes, but leave it up to the caller to filter them out at the
application level.


> but in that case external quotes aren't required:
> 
>     number,name,price,comment
>     1,Twilight,150,good friend
>     2,Fluttershy,142,gentle
>     3,Pinkie Pie,169,He said ""oh my gosh""

Actually, this has already worked before. (Excepting the untranslated
doubled quotes, of course.)


> but at least it's always one record per line, no? No? No.
> 
>     number,name,price,comment
>     1,Twilight,150,good friend
>     2,Fluttershy,142,gentle
>     3,Pinkie Pie,169,"He said
>     ""oh my gosh""
>     And she replied
>     ""Come on! Have fun!"""

Fixed.


> I'll stop there, but you get the picture. Simply splitting by line
> then separator may work well on most data, but I wouldn't put it in
> production or in the standard library.

Actually, my code does *not* split by line then by separator. Did you
read it? ;-)


T

-- 
The most powerful one-line C program: #include "/dev/tty" -- IOCCC


More information about the Digitalmars-d-learn mailing list