Speed of csvReader
data pulverizer via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Thu Jan 21 07:17:08 PST 2016
On Thursday, 21 January 2016 at 14:56:13 UTC, Saurabh Das wrote:
> On Thursday, 21 January 2016 at 14:32:52 UTC, Saurabh Das wrote:
>> On Thursday, 21 January 2016 at 13:42:11 UTC, Edwin van
>> Leeuwen wrote:
>>> On Thursday, 21 January 2016 at 09:39:30 UTC, data pulverizer
>>> wrote:
>>>> StopWatch sw;
>>>> sw.start();
>>>> auto buffer = std.file.readText("Acquisition_2009Q2.txt");
>>>> auto records = csvReader!row_type(buffer, '|').array;
>>>> sw.stop();
>>>
>>>
>>> Is it csvReader or readText that is slow? i.e. could you move
>>> sw.start(); one line down (after the readText command) and
>>> see how long just the csvReader part takes?
>>
>> Please try this:
>>
>> auto records =
>> File("Acquisition_2009Q2.txt").byLine.joiner("\n").csvReader!row_type('|').array;
>>
>> Can you put up some sample data and share the number of
>> records in the file as well.
>
> Actually since you're aiming for speed, this might be better:
>
> sw.start();
> auto records =
> File("Acquisition_2009Q2.txt").byChunk(1024*1024).joiner.map!(a
> => cast(dchar)a).csvReader!row_type('|').array
> sw.stop();
>
> Please do verify that the end result is the same - I'm not 100%
> confident of the cast.
>
> Thanks,
> Saurabh
@Saurabh I have tried your latest suggestion and the time reduces
fractionally to:
Time (s): 6.345
the previous suggestion actually increased the time
@Edwin van Leeuwen The csvReader is what takes the most time, the
readText takes 0.229 s
More information about the Digitalmars-d-learn
mailing list