[Issue 17229] New: File.byChunk w/ stdout.lockingTextWriter is very slow
via Digitalmars-d-bugs
digitalmars-d-bugs at puremagic.com
Mon Feb 27 00:40:24 PST 2017
https://issues.dlang.org/show_bug.cgi?id=17229
Issue ID: 17229
Summary: File.byChunk w/ stdout.lockingTextWriter is very slow
Product: D
Version: D2
Hardware: x86
OS: Mac OS X
Status: NEW
Severity: enhancement
Priority: P1
Component: phobos
Assignee: nobody at puremagic.com
Reporter: jrdemail2000-dlang at yahoo.com
Using File.byChunk to read and write with stdout.lockingTextWriter is very
slow. Dramatically slower (15x) than the same activity with File.byLine.
Not clear if there's real connection between File.byChunk and
stdout.lockingTextWriter, but for other operations that read and access the
data without writing File.byChunk is faster than File.byLine.
---Copy file byChunk code---
auto chunkedStream = filename.File.byChunk(1024*1024);
auto stdoutWriter = stdout.lockingTextWriter;
chunkedStream.each!(x => put(stdoutWriter, x));
---Copy file byLine code---
auto chunkedStream = filename.File.byLine(Yes.keepTerminator);
auto stdoutWriter = stdout.lockingTextWriter;
chunkedStream.each!(x => put(stdoutWriter, x));
The above in a simple main program, copying a 2.7 GB, 14 million file has
following times (ldc 1.1 -release -O -boundscheck=off):
byLine: 2.09 seconds
byChunk: 35.24 seconds
A 17x delta. I tried a number of different formulations of the code, it had the
same each time.
Changing the program to read and access the data without writing changes,
things so that byChunk is faster.
---Count 9's byChunk code fragment---
auto chunkedStream = filename.File.byChunk(1024*1024);
size_t count = 0;
chunkedStream.each!(x => count += x.count('9'));
writefln("Found %d '9's", count);
---Count 9's byLine code fragment---
auto chunkedStream = filename.File.byLine(Yes.keepTerminator);
size_t count = 0;
chunkedStream.each!(x => count += x.count('9'));
writefln("Found %d '9's", count);
Results for the count 9's program, against the 2.7, 14 million line file:
byLine: 8.98 seconds
byChunk: 1.64 seconds
Different formulations of the above have the same result, including the same
formulations in the byChunk documentation.
The above suggests that reading with File.byChunk may not problematic by
itself, but that the slow writing is somehow connected.
---Full program used for byChunk---
import std.algorithm;
import std.range;
import std.stdio;
void main(string[] cmdArgs)
{
if (cmdArgs.length < 2)
{
writeln("synopis: copyfile_bychunk file");
}
else
{
auto filename = cmdArgs[1];
auto chunkedStream = (filename == "-") ? stdin.byChunk(1024*1024) :
filename.File.byChunk(1024*1024);
auto stdoutWriter = stdout.lockingTextWriter;
chunkedStream.each!(x => put(stdoutWriter, x));
}
}
The other test programs were written similarly. Tests were on OS X.
--
More information about the Digitalmars-d-bugs
mailing list