[phobos] Bug 3763 (How to fix readlnImpl())
David Simcha
dsimcha at gmail.com
Thu Feb 4 09:31:28 PST 2010
The other thing is, how many people are really going to use readln(char[]
buf) directly instead of using the version that returns a newly allocated
string or using byLine()? If they're using the version that returns a newly
allocated string, this is a non-issue. If they're using byLine(), then a
lot of the logic for recycling buffers could be moved there, where it is
safer because it is well-encapsulated.
On Thu, Feb 4, 2010 at 12:02 PM, Andrei Alexandrescu <andrei at erdani.com>wrote:
> So it seems like due to the low-level APIs that readlnImpl calls, it flies
> under the radar of Steve's detection.
>
> I think using an Appender with the useExistingBuffer primitive would work,
> but would be suboptimal - each read can only use as many characters as read
> the last time, which would trigger quite a few allocation (whenever the line
> size increases). Please let me know whether this assessment is correct.
>
> A possible solution would be to cache the last buffer and its maximum
> length in static variables inside readlnImpl. I think it's reasonable to say
> that readln is the owner of the char[] and you shouldn't take portions of it
> and expect they won't be changed. However, it will not trump over existing
> arrays.
>
> Andrei
>
> David Simcha wrote:
>
>>
>>
>> On Thu, Feb 4, 2010 at 12:10 AM, Andrei Alexandrescu <andrei at erdani.com<mailto:
>> andrei at erdani.com>> wrote:
>>
>> David Simcha wrote:
>>
>> I recently filed bug 3673
>> (http://d.puremagic.com/issues/show_bug.cgi?id=3763) and started
>> looking at possible fixes. This is a bug in
>> std.stdio.readlnImpl(). However, the more I think about it the
>> more I think the function needs to be completely rethought, not
>> just patched. I'm not sure if this is a high priority given
>> that it has nothing to do with the language spec and TDPL but
>> it's a pretty embarrassing quality of implementation issue.
>>
>> Anyhow, readlnImpl() takes a ref char[] and tries to recycle the
>> memory to read in another line. In doing so, it queries
>> GC.capacity for the relevant memory block and resizes the array
>> to the size of the memory block. This is arguably unsafe in the
>> general case because, if someone passes in a slice that starts
>> at the beginning of a GC block to use as the buffer, everything
>> else in the same GC block can get overwritten. This massively
>> violates the principle of least surprise. On the other hand,
>> when encapsulated in the higher level API of byLine(), it's both
>> safe and efficient.
>>
>>
>> Could you please give an example?
>>
>> import std.stdio;
>>
>> void main() {
>> auto writer = File("foo.txt", "wb");
>> foreach(i; 0..1000) {
>> writer.write('a');
>> }
>> writer.writeln();
>> writer.close();
>>
>> auto chars = new char[500];
>> chars[] = 'b';
>> auto buf = chars[0..100];
>>
>> auto reader = File("foo.txt", "rb");
>> reader.readln(buf);
>> writeln(chars[$ - 1]); // a
>> assert(chars[$ - 1] == 'b'); // FAILS.
>> }
>>
>>
>> ------------------------------------------------------------------------
>>
>>
>> _______________________________________________
>> phobos mailing list
>> phobos at puremagic.com
>> http://lists.puremagic.com/mailman/listinfo/phobos
>>
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20100204/3b2ef485/attachment-0001.htm>
More information about the phobos
mailing list