Speed of csvReader

Jesse Phillips via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Fri Jan 22 07:52:42 PST 2016


On Friday, 22 January 2016 at 01:36:40 UTC, cym13 wrote:
> On Friday, 22 January 2016 at 01:27:13 UTC, H. S. Teoh wrote:
>> And now that you mention this, RFC-4180 does not allow doubled 
>> quotes in an unquoted field. I'll take that out of the code 
>> (it improves performance :-D).
>
> Right, re-reading the RFC would have been a great thing. That 
> said I saw that kind of CSV in the real world, so I don't know 
> what to think of it. I'm not saying it should be supported, but 
> I wonder if there are points outside RFC-4180 that are taken 
> for granted.

You have to understand CSV didn't come from a standard. People 
started using because it was simple for writing out some tabular 
data. Then they changed it because their data changed. It's not 
like their language came with a CSV parser, it was always hand 
written and people still do it today. And that is why data is 
delimited with so many things not comma (people thought they 
wouldn't need to escape their data).

So yes, some CSV parsers will accept comments but that just means 
it breaks for people that have # in their data. Yeah, you can 
assume that two double quotes in unquoted data is just a quote, 
but then it breaks for those who have that kind of data which 
isn't escaped.

There is also many other issues with CSV data, like is the file 
in ASCII or UTF or some other code page. And many times CSV isn't 
well formed because the data was output without proper escaping.

std.csv isn't the end-all csv parsers, but it will at least 
handle well formed CSV that use different separators or quotes.


More information about the Digitalmars-d-learn mailing list