Speed of csvReader
data pulverizer via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Thu Jan 21 11:25:01 PST 2016
On Thursday, 21 January 2016 at 19:08:38 UTC, data pulverizer
wrote:
> On Thursday, 21 January 2016 at 18:46:03 UTC, Justin Whear
> wrote:
>> On Thu, 21 Jan 2016 18:37:08 +0000, data pulverizer wrote:
>>
>>> It's interesting that the output first array is not the same
>>> as the input
>>
>> byLine reuses a buffer (for speed) and the subsequent split
>> operation just returns slices into that buffer. So when
>> byLine progresses to the next line the strings (slices)
>> returned previously now point into a buffer with different
>> contents. You should either use byLineCopy or .idup to create
>> copies of the relevant strings. If your use-case allows for
>> streaming and doesn't require having all the data present at
>> once, you could continue to use byLine and just be careful not
>> to refer to previous rows.
>
> Thanks. It now works with byLineCopy()
>
> Time (s): 1.128
Currently the timing is similar to python pandas:
# Script (Python 2.7.6)
import pandas as pd
import time
col_types = {'col1': str, 'col2': str, 'col3': str, 'col4': str,
'col5': str, 'col6': str, 'col7': str, 'col8': str, 'col9': str,
'col10': str, 'col11': str, 'col12': str, 'col13': str, 'col14':
str, 'col15': str, 'col16': str, 'col17': str, 'col18': str,
'col19': str, 'col20': str, 'col21': str, 'col22': str}
begin = time.time()
x = pd.read_csv('Acquisition_2009Q2.txt', sep = '|', dtype =
col_types)
end = time.time()
print end - begin
$ python file_read.py
1.19544792175
More information about the Digitalmars-d-learn
mailing list