D is for Data Science

Dmitry Olshansky via Digitalmars-d-announce digitalmars-d-announce at puremagic.com
Mon Nov 24 15:36:37 PST 2014


25-Nov-2014 01:28, bearophile пишет:
> Dmitry Olshansky:
>
>>> Why is File.byLine so slow?
>>
>> Seems to be mostly fixed sometime ago.
>
> Really? I am not so sure.
>
> Bye,
> bearophile

I too has suspected it in the past and then I tested it.
Now I test it again, it's always easier to check then to argue.

Two minimal programs
//my.d:
import std.stdio;

void main(string[] args) {
     auto file = File(args[1], "r");
     size_t cnt=0;
     foreach(char[] line; file.byLine()) {
         cnt++;
     }
}
//my2.d
import core.stdc.stdio;

void main(string[] args) {
     char[] buf = new char[32768];
     size_t cnt;
     shared(FILE)* file = fopen(args[1].ptr, "r");
     while(fgets(buf.ptr, cast(int)buf.length, file) != null){
         cnt++;
     }
     fclose(file);
}

In the below console session, log file - is my dmsg log replicated many 
times (34 megs total).

dmitry at Ubu64 ~ $ wc -l log
522240 log
dmitry at Ubu64 ~ $ du -hs log
34M	log

# touch it, to have it in disk cache:
dmitry at Ubu64 ~ $ cat log > /dev/null

dmitry at Ubu64 ~ $ dmd my
dmitry at Ubu64 ~ $ dmd my2

dmitry at Ubu64 ~ $ time ./my2 log

real	0m0.062s
user	0m0.039s
sys	0m0.023s
dmitry at Ubu64 ~ $ time ./my log

real	0m0.181s
user	0m0.155s
sys	0m0.025s

~4 time in user mode, okay...
Now with full optimizations, ranges are very sensitive to optimizations:

dmitry at Ubu64 ~ $ dmd -O -release -inline  my
dmitry at Ubu64 ~ $ dmd -O -release -inline  my2
dmitry at Ubu64 ~ $ time ./my2 log

real	0m0.065s
user	0m0.042s
sys	0m0.023s
dmitry at Ubu64 ~ $ time ./my2 log

real	0m0.063s
user	0m0.040s
sys	0m0.023s

Which is 1:1 parity. Another myth busted? ;)

-- 
Dmitry Olshansky


More information about the Digitalmars-d-announce mailing list