Pop quiz [memory usage]
Fawzi Mohamed
fmohamed at mac.com
Sun Jun 7 02:33:06 PDT 2009
On 2009-06-07 02:23:47 +0200, Lionello Lunesu <lio at lunesu.remove.com> said:
> Vladimir Panteleev wrote:
>> On Sat, 06 Jun 2009 18:12:58 +0300, Jarrett Billingsley
>> <jarrett.billingsley at gmail.com> wrote:
>>
>>> On Sat, Jun 6, 2009 at 8:03 AM, Vladimir
>>> Panteleev<thecybershadow at gmail.com> wrote:
>>>> // Works for DMD1/Phobos, DMD1/Tango and DMD2/Phobos
>>>> version(Tango) import tango.io.Console;
>>>> else import std.stdio;
>>>>
>>>> struct S
>>>> {
>>>> ubyte[40_000] data;
>>>> }
>>>>
>>>> void main()
>>>> {
>>>> S[] a;
>>>> a ~= S();
>>>>
>>>> // QUESTION: How much memory will this program consume upon reaching
>>>> this point?
>>>> version(Tango) Cin.copyln();
>>>> else readln();
>>>> }
>>>>
>>>
>>> There seems to be something wrong with the newCapacity function that
>>> _d_arrayappendcT calls. From an element size of 20000 (I halved it
>>> just to make the allocation faster) and an array length of 1, it
>>> somehow calculates the new size to be 266686600. Hm. That seems a
>>> bit off.
>>>
>>> It seems this line:
>>>
>>> long mult = 100 + (1000L * size) / log2plus1(newcap);
>>>
>>> is to blame. I don't think large value types were taken into account
>>> here. The resulting multiplier is 1,333,433, which is hilariously
>>> large.
>>
>> Bulls-eye. I think mr. Dave Fladebo deserves a math lesson or
>> something, and mr. Walter Bright should review patches that go into the
>> language's runtime more carefully. :P
>>
>> Can we discuss a suitable replacement? I suggested in #d that we look
>> at how other platforms do it. For example, Java's Vector just doubles
>> its capacity by default (
>> http://java.sun.com/javase/6/docs/api/java/util/Vector.html ).
>>
>
> In my own array classes I'm using Python's: size += max(size>>3, 8);
> Using this, you won't end up wasting a lot of memory. Preallocating is
> always preferred anyway.
>
> L.
actually the original function proposed by Dave Fladebo is not so bad,
it calculates the average storage requirement per bit
(total_size/num_bits) and tries to allocate that space more.
The advantage is that it stays exponential like which means ~O(log(n))
(maybe log(n)*log(log(n))) reallocations needed when continuously
adding, but tries to waste less and less space the larger the array is.
I did propose two "reasonable" possibilities, that I give again here in
a cleaner way
{{{
const int b=2;
version(A){
const a =1000L;
} else {
const a=100L;
}
static int log2plusB(size_t c)
{
int i=b;
while(c >>= 1){
++i;
}
return i;
}
variant(A)
long mult = 100 + a*b / log2plusB(newcap);
} else {
long mult = 100 + a*b / log2plusB(newlength);
}
// testing shows 1.02 for large arrays is about the point
of diminishing return
if (mult < 102)
mult = 102;
newext = cast(size_t)((newcap * mult) / 100);
newext += size-(newext % size);
}}}
version A uses the total memory, the other the number of elements.
Increasing a makes the the amount of reserved space larger (it is the
maximum amount of extra reserved space in %).
The value of a I choose is different from one version to the other
because min(log2(newcap))=~12, min(log2(newlength))=1
Increasing b makes the decrease flatter, and goes closer to treating
all sizes with the same extra space. For small b the decrease for
larger arrays is faster.
In general I think that the functional form is sound, and it is
superior to either fixed "double the size" or "add a constant size".
This approach should be used also in other places that need to
periodically reallocate.
The question is which values of a and b are optimal, and if it is
better to use the total size or the number of elements.
Now if someone would to some meaningful benchmarks (on real
applications), then I would be very happy, otherwise I would just take
one of my guesses...
Fawzi
More information about the Digitalmars-d
mailing list