Big Oversight with readln?
Jonathan Marler via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Thu Feb 23 20:22:17 PST 2017
On Friday, 24 February 2017 at 03:45:35 UTC, Nick Sabalausky
(Abscissa) wrote:
> On 02/23/2017 09:43 PM, Jonathan Marler wrote:
>> I can't figure out how to make use of the full capacity of
>> buffers that
>> are allocated by readln. Take the example code from the
>> documentation:
>>
>> // Read lines from $(D stdin) and count words
>>
>> void main()
>> {
>> char[] buf;
>> size_t words = 0;
>>
>> while (!stdin.eof)
>> {
>> char[] line = buf;
>> stdin.readln(line);
>> if (line.length > buf.length)
>> buf = line;
>>
>> words += line.split.length;
>> }
>>
>> writeln(words);
>> }
>>
>> When buf is not large enough to hold the line, readln will
>> allocate a
>> new buffer to accomodate and this example shows how you can
>> save that
>> new buffer to reuse the next time. The problem is that the
>> capacity of
>> the new buffer is nowhere to be found. readln only returns the
>> line that
>> was read which is only a slice of the buffer that was
>> allocated. The
>> next time that readln is called, it will not read past the
>> slice even if
>> the capacity of the buffer it allocated was larger. This will
>> cause a
>> new allocation/copy every time you read a line that was larger
>> than all
>> the previous lines, even if a previous allocation was already
>> large
>> enough. This seems like a big oversight to me, I must be
>> missing
>> something right?
>
> I don't think that problem is actually occurring:
>
> Let's step through the code, and suppose you're reading the
> following four lines of text:
>
> 12345
> 123456789
> 123
> 1234567
>
> Starting out, buf.length is 0. When reading the first line, the
> buffer isn't big enough for 5, so readln allocates returns new
> buffer of length 5. That is more than buf.length (0), so the
> new buffer becomes the new buf.
>
> Second line, again, buf (length 5) isn't big enough for 9, so
> readln allocates a new buffer length 9. That's more than the
> old one (5), so again your code sets buf to the larger new
> buffer (length 9).
>
> Third line: buf (length 9) can definitely hold length 3, so
> readln does not allocate. The new slice returned (length 3) is
> NOT longer than buf (still length 9), so buf is NOT set to the
> slice returned by readln. So buf REMAINS length 9.
>
> Fourth line: buf (still length 9) can definitely hold length 7,
> so readln does not allocate.
You're looking at this from the apps perspective and forgetting
about what readln is doing under the hood. It can't know how big
the next line is going to be before it reads it so it's going
guess how much to allocate. If you look at the implementation in
(http://github.com/dlang/phobos/blob/master/std/stdio.d), you can
see it doubles the size of the current buffer and adds some more
for good measure (on line 4479 as of writing this).
So in your example after it reads the first line, its going to
allocate an initial buffer of some size, maybe 200 or so, then
eventually returns a slice of the first 5 characters into that
buffer. When it reads the second line it can't use the rest of
that initial buffer because the size of the buffer is gone, so it
has to allocate a new buffer. At least that's what I thought
until I found what I was missing!
I discovered the .capacity property of arrays. I don't know why
I've never seen this but it looks like this is how readln is
recovering this seemingly lost peice of data. This does have an
odd consequence though, if you pass a slice into readln it will
read past the end of it if the underlying buffer is larger. This
might be something worth adding to the documentation.
Also I'm not completely sure how .capacity works, I assume it has
to look up this information in the memory management metadata.
Any enlightenment on this subject is appreciated. It also says
it's a O(log(n)) operation so I'm guessing it's looking it up in
some sort of binary tree data structure.
More information about the Digitalmars-d-learn
mailing list