Read text file fast, how?

Andrei Alexandrescu via Digitalmars-d digitalmars-d at puremagic.com
Sun Jul 26 08:36:37 PDT 2015


On 7/26/15 10:35 AM, Johan Holmberg via Digitalmars-d wrote:
>
> On Sat, Jul 25, 2015 at 10:12 PM, Andrei Alexandrescu via Digitalmars-d
> <digitalmars-d at puremagic.com <mailto:digitalmars-d at puremagic.com>> wrote:
>
>     On 7/25/15 1:53 PM, Johan Holmberg via Digitalmars-d wrote:
>
>         Thanks, my question seems like a carbon copy of the Stack Overflow
>         article :) Somehow I had missed it when googling.
>
>         I download a dmd 2.068 beta, and re-tried with my input file:
>         now the D
>         program takes 1.6s (a 10x improvement).
>
>
>     Great, though it still seems to be behind the C++ version, which is
>     a bummer. -- Andrei
>
>
> My C++ program was actually doing C-style IO via <stdio.h>. I didn't
> think about the distinction C/C++ when reporting the earlier numbers.
>
> If I switch to full C++ style: <fstream> + <string> + C++ version of
> getline(), then the C++-solution is even slower than Python: 5.2s. I
> think it is the C++ libraries of Clang on MacOS Yosemite that are slow.
>
> This prompted me to re-run the tests on a Linux machine (Ubuntu 14.04),
> still with the same input file, a text file with 7M lines and total size
> of 466MB:
>
> C++ with <stdio.h> style IO:    0.40s
> C++ with <fstream> style IO:   0.31s
> D 2.067                                    1.75s
> D 2.068 beta 2:                        0.69s
> Perl:                                         1.49s
> Python:                                    1.86s
>
> So on Ubuntu, the C++ <fstream> version was clearly best. And the
> improvement in DMD 2.068 beta "only" a factor of 2.5 from 2.067.
>
> /johan

I think we should investigate this and bring performance to par. Anyone 
interested? -- Andrei


More information about the Digitalmars-d mailing list