[phobos] Bug 3763 (How to fix readlnImpl())

Andrei Alexandrescu andrei at erdani.com
Thu Feb 4 09:02:10 PST 2010


So it seems like due to the low-level APIs that readlnImpl calls, it 
flies under the radar of Steve's detection.

I think using an Appender with the useExistingBuffer primitive would 
work, but would be suboptimal - each read can only use as many 
characters as read the last time, which would trigger quite a few 
allocation (whenever the line size increases). Please let me know 
whether this assessment is correct.

A possible solution would be to cache the last buffer and its maximum 
length in static variables inside readlnImpl. I think it's reasonable to 
say that readln is the owner of the char[] and you shouldn't take 
portions of it and expect they won't be changed. However, it will not 
trump over existing arrays.

Andrei

David Simcha wrote:
> 
> 
> On Thu, Feb 4, 2010 at 12:10 AM, Andrei Alexandrescu <andrei at erdani.com 
> <mailto:andrei at erdani.com>> wrote:
> 
>     David Simcha wrote:
> 
>         I recently filed bug 3673
>         (http://d.puremagic.com/issues/show_bug.cgi?id=3763) and started
>         looking at possible fixes.  This is a bug in
>         std.stdio.readlnImpl().  However, the more I think about it the
>         more I think the function needs to be completely rethought, not
>         just patched.  I'm not sure if this is a high priority given
>         that it has nothing to do with the language spec and TDPL but
>         it's a pretty embarrassing quality of implementation issue.
> 
>         Anyhow, readlnImpl() takes a ref char[] and tries to recycle the
>         memory to read in another line.  In doing so, it queries
>         GC.capacity for the relevant memory block and resizes the array
>         to the size of the memory block.  This is arguably unsafe in the
>         general case because, if someone passes in a slice that starts
>         at the beginning of a GC block to use as the buffer, everything
>         else in the same GC block can get overwritten.  This massively
>         violates the principle of least surprise.  On the other hand,
>         when encapsulated in the higher level API of byLine(), it's both
>         safe and efficient.
> 
> 
>     Could you please give an example?
> 
>  
> import std.stdio;
> 
> void main() {
>     auto writer = File("foo.txt", "wb");
>     foreach(i; 0..1000) {
>         writer.write('a');
>     }
>     writer.writeln();
>     writer.close();
> 
>     auto chars = new char[500];
>     chars[] = 'b';
>     auto buf = chars[0..100];
> 
>     auto reader = File("foo.txt", "rb");
>     reader.readln(buf);
>     writeln(chars[$ - 1]);  // a
>     assert(chars[$ - 1] == 'b');  // FAILS.
> }
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos


More information about the phobos mailing list