Reading from stdin significantly slower than reading file directly?

Steven Schveighoffer schveiguy at gmail.com
Thu Aug 13 14:41:02 UTC 2020


On 8/12/20 6:44 PM, methonash wrote:
> Hi,
> 
> Relative beginner to D-lang here, and I'm very confused by the apparent 
> performance disparity I've noticed between programs that do the following:
> 
> 1) cat some-large-file | D-program-reading-stdin-byLine()
> 
> 2) D-program-directly-reading-file-byLine() using File() struct
> 
> The D-lang difference I've noticed from options (1) and (2) is somewhere 
> in the range of 80% wall time taken (7.5s vs 4.1s), which seems pretty 
> extreme.
> 
> For comparison, I attempted the same using Perl with the same large 
> file, and I only noticed a 25% difference (10s vs 8s) in performance, 
> which I imagine to be partially attributable to the overhead incurred by 
> using a pipe and its buffer.
> 
> So, is this difference in D-lang performance typical? Is this expected 
> behavior?
> 
> Was wondering if this may have anything to do with the library 
> definition for std.stdio.stdin 
> (https://dlang.org/library/std/stdio/stdin.html)? Does global 
> file-locking significantly affect read-performance?
> 
> For reference: I'm trying to build a single-threaded application; my 
> present use-case cannot benefit from parallelism, because its ultimate 
> purpose is to serve as a single-threaded downstream filter from an 
> upstream application consuming (n-1) system threads.

Are we missing the obvious here? cat needs to read from disk, write the 
results into a pipe buffer, then context-switch into your D program, 
then the D program reads from the pipe buffer.

Whereas, reading from a file just needs to read from the file.

The difference does seem a bit extreme, so maybe there is another more 
complex explanation.

But for sure, reading from stdin doesn't do anything different than 
reading from a file if you are using the File struct.

A more appropriate test might be using the shell to feed the file into 
the D program:

dprogram < FILE

Which means the same code runs for both tests.

-Steve


More information about the Digitalmars-d-learn mailing list