[phobos] Bug 3763 (How to fix readlnImpl())

Andrei Alexandrescu andrei at erdani.com
Thu Feb 4 10:43:33 PST 2010


I think readln(ref char[] buf) is useful for the sake of low-level 
efficiency (irony of the current state of affairs notwithstanding). If I 
were to make an executive decision now, I'd say let's fix the efficiency 
issue on version(DMD_IO) and leave things as they are for the time being.

Andrei

David Simcha wrote:
> The other thing is, how many people are really going to use 
> readln(char[] buf) directly instead of using the version that returns a 
> newly allocated string or using byLine()?  If they're using the version 
> that returns a newly allocated string, this is a non-issue.  If they're 
> using byLine(), then a lot of the logic for recycling buffers could be 
> moved there, where it is safer because it is well-encapsulated.
> 
> On Thu, Feb 4, 2010 at 12:02 PM, Andrei Alexandrescu <andrei at erdani.com 
> <mailto:andrei at erdani.com>> wrote:
> 
>     So it seems like due to the low-level APIs that readlnImpl calls, it
>     flies under the radar of Steve's detection.
> 
>     I think using an Appender with the useExistingBuffer primitive would
>     work, but would be suboptimal - each read can only use as many
>     characters as read the last time, which would trigger quite a few
>     allocation (whenever the line size increases). Please let me know
>     whether this assessment is correct.
> 
>     A possible solution would be to cache the last buffer and its
>     maximum length in static variables inside readlnImpl. I think it's
>     reasonable to say that readln is the owner of the char[] and you
>     shouldn't take portions of it and expect they won't be changed.
>     However, it will not trump over existing arrays.
> 
>     Andrei
> 
>     David Simcha wrote:
> 
> 
> 
>         On Thu, Feb 4, 2010 at 12:10 AM, Andrei Alexandrescu
>         <andrei at erdani.com <mailto:andrei at erdani.com>
>         <mailto:andrei at erdani.com <mailto:andrei at erdani.com>>> wrote:
> 
>            David Simcha wrote:
> 
>                I recently filed bug 3673
>                (http://d.puremagic.com/issues/show_bug.cgi?id=3763) and
>         started
>                looking at possible fixes.  This is a bug in
>                std.stdio.readlnImpl().  However, the more I think about
>         it the
>                more I think the function needs to be completely
>         rethought, not
>                just patched.  I'm not sure if this is a high priority given
>                that it has nothing to do with the language spec and TDPL but
>                it's a pretty embarrassing quality of implementation issue.
> 
>                Anyhow, readlnImpl() takes a ref char[] and tries to
>         recycle the
>                memory to read in another line.  In doing so, it queries
>                GC.capacity for the relevant memory block and resizes the
>         array
>                to the size of the memory block.  This is arguably unsafe
>         in the
>                general case because, if someone passes in a slice that
>         starts
>                at the beginning of a GC block to use as the buffer,
>         everything
>                else in the same GC block can get overwritten.  This
>         massively
>                violates the principle of least surprise.  On the other hand,
>                when encapsulated in the higher level API of byLine(),
>         it's both
>                safe and efficient.
> 
> 
>            Could you please give an example?
> 
>          import std.stdio;
> 
>         void main() {
>            auto writer = File("foo.txt", "wb");
>            foreach(i; 0..1000) {
>                writer.write('a');
>            }
>            writer.writeln();
>            writer.close();
> 
>            auto chars = new char[500];
>            chars[] = 'b';
>            auto buf = chars[0..100];
> 
>            auto reader = File("foo.txt", "rb");
>            reader.readln(buf);
>            writeln(chars[$ - 1]);  // a
>            assert(chars[$ - 1] == 'b');  // FAILS.
>         }
> 
> 
>         ------------------------------------------------------------------------
> 
> 
>         _______________________________________________
>         phobos mailing list
>         phobos at puremagic.com <mailto:phobos at puremagic.com>
>         http://lists.puremagic.com/mailman/listinfo/phobos
> 
>     _______________________________________________
>     phobos mailing list
>     phobos at puremagic.com <mailto:phobos at puremagic.com>
>     http://lists.puremagic.com/mailman/listinfo/phobos
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos


More information about the phobos mailing list