Speed of csvReader

H. S. Teoh via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Thu Jan 21 17:27:13 PST 2016


On Fri, Jan 22, 2016 at 01:13:07AM +0000, Jesse Phillips via Digitalmars-d-learn wrote:
> On Thursday, 21 January 2016 at 23:03:23 UTC, cym13 wrote:
> >but in that case external quotes aren't required:
> >
> >    number,name,price,comment
> >    1,Twilight,150,good friend
> >    2,Fluttershy,142,gentle
> >    3,Pinkie Pie,169,He said ""oh my gosh""
> 
> std.csv will reject this. If validation is turned off this is fine but
> your data will include "".
> 
> "A field containing new lines, commas, or double quotes should be
> enclosed in double quotes (customizable)"
> 
> This because it is not possible to decide what correct parsing should
> be. Is the data using including two double quotes? What if there was
> only one quote there, do I have to remember it was their and decide
> not to throw it out because I didn't see another quote? At this point
> the data is not following CSV rules so if I'm validating I'm throwing
> it out and if I'm not validating I'm not stripping data.

This case is still manageable, because there are no embedded commas.
Everything between the last comma and the next comma or newline
unambiguously belongs to the current field.  As to how to interpret it
(should the result contain single or doubled quotes?), though, that
could potentially be problematic.

And now that you mention this, RFC-4180 does not allow doubled quotes in
an unquoted field. I'll take that out of the code (it improves
performance :-D).


T

-- 
First Rule of History: History doesn't repeat itself -- historians merely repeat each other.


More information about the Digitalmars-d-learn mailing list