[Issue 17229] File.byChunk w/ stdout.lockingTextWriter is very slow

via Digitalmars-d-bugs digitalmars-d-bugs at puremagic.com
Wed Mar 1 01:05:19 PST 2017


https://issues.dlang.org/show_bug.cgi?id=17229

--- Comment #3 from Jon Degenhardt <jrdemail2000-dlang at yahoo.com> ---
I've confirmed that File.byChunk with lockingTextWriter corrupts utf-8 encoded
files.

I used the unicode test file:
    http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt

and the example given with the File.byChunk documentation:

    // Efficient file copy, 1MB at a time.
    import std.algorithm, std.stdio;
    void main()
    {
        stdin.byChunk(1024 * 1024).copy(stdout.lockingTextWriter());
    }

This file copy program corrupts the unicode characters as described in Steven's
comment. This is a quite problematic, both because of character corruption and
because it is an example in the documentation.

The new method, lockingBinaryWriter, copies the file correctly. It is available
starting with 2.073.1. lockingBinaryWriter also copies the file quickly,
eliminating the performance issue.

It is appears from the PR for lockingBinaryWriter
(https://github.com/dlang/phobos/pull/2011) that there was discussion of the
roles of Binary and Text writer.

Regardless of availability of the lockingBinaryWriter, the lockingTextWriter
certainly looks broken when used with the ubyte data type. Personally, I think
it makes sense for lockingTextWriter to assume ubyte arrays are correctly
encoded, or perhaps are utf-8 encoded. This would potentially allow newline
translation, something that the lockingBinaryWriter would presumably not do.

--


More information about the Digitalmars-d-bugs mailing list