Pop quiz [memory usage]

Fawzi Mohamed fmohamed at mac.com
Sun Jun 7 02:33:06 PDT 2009


On 2009-06-07 02:23:47 +0200, Lionello Lunesu <lio at lunesu.remove.com> said:

> Vladimir Panteleev wrote:
>> On Sat, 06 Jun 2009 18:12:58 +0300, Jarrett Billingsley 
>> <jarrett.billingsley at gmail.com> wrote:
>> 
>>> On Sat, Jun 6, 2009 at 8:03 AM, Vladimir
>>> Panteleev<thecybershadow at gmail.com> wrote:
>>>> // Works for DMD1/Phobos, DMD1/Tango and DMD2/Phobos
>>>> version(Tango) import tango.io.Console;
>>>> else           import std.stdio;
>>>> 
>>>> struct S
>>>> {
>>>>        ubyte[40_000] data;
>>>> }
>>>> 
>>>> void main()
>>>> {
>>>>        S[] a;
>>>>        a ~= S();
>>>> 
>>>>        // QUESTION: How much memory will this program consume upon reaching
>>>> this point?
>>>>        version(Tango) Cin.copyln();
>>>>        else           readln();
>>>> }
>>>> 
>>> 
>>> There seems to be something wrong with the newCapacity function that
>>> _d_arrayappendcT calls.  From an element size of 20000 (I halved it
>>> just to make the allocation faster) and an array length of 1, it
>>> somehow calculates the new size to be 266686600.  Hm.  That seems a
>>> bit off.
>>> 
>>> It seems this line:
>>> 
>>> long mult = 100 + (1000L * size) / log2plus1(newcap);
>>> 
>>> is to blame.  I don't think large value types were taken into account
>>> here.  The resulting multiplier is 1,333,433, which is hilariously
>>> large.
>> 
>> Bulls-eye. I think mr. Dave Fladebo deserves a math lesson or 
>> something, and mr. Walter Bright should review patches that go into the 
>> language's runtime more carefully. :P
>> 
>> Can we discuss a suitable replacement? I suggested in #d that we look 
>> at how other platforms do it. For example, Java's Vector just doubles 
>> its capacity by default ( 
>> http://java.sun.com/javase/6/docs/api/java/util/Vector.html ).
>> 
> 
> In my own array classes I'm using Python's: size += max(size>>3, 8);
> Using this, you won't end up wasting a lot of memory. Preallocating is 
> always preferred anyway.
> 
> L.

actually the original function proposed by Dave Fladebo is not so bad, 
it calculates the average storage requirement per bit 
(total_size/num_bits) and tries to allocate that space more.
The advantage is that it stays exponential like which means ~O(log(n)) 
(maybe log(n)*log(log(n))) reallocations needed when continuously 
adding, but tries to waste less and less space the larger the array is.
I did propose two "reasonable" possibilities, that I give again here in 
a cleaner way
{{{
            const int b=2;
            version(A){
              const a =1000L;
            } else {
              const a=100L;
            }

            static int log2plusB(size_t c)
            {
                int i=b;
                while(c >>= 1){
                    ++i;
                }
                return i;
            }
            variant(A)
              long mult = 100 + a*b / log2plusB(newcap);
            } else {
              long mult = 100 + a*b / log2plusB(newlength);
            }

            // testing shows 1.02 for large arrays is about the point 
of diminishing return
            if (mult < 102)
                mult = 102;
            newext = cast(size_t)((newcap * mult) / 100);
            newext += size-(newext % size);
}}}
version A uses the total memory, the other the number of elements.

Increasing a makes the the amount of reserved space larger (it is the 
maximum amount of extra reserved space in %).
The value of a I choose is different from one version to the other 
because min(log2(newcap))=~12, min(log2(newlength))=1

Increasing b makes the decrease flatter, and goes closer to treating 
all sizes with the same extra space. For small b the decrease for 
larger arrays is faster.

In general I think that the functional form is sound, and it is 
superior to either fixed "double the size" or "add a constant size". 
This approach should be used also in other places that need to 
periodically reallocate.

The question is which values of a and b are optimal, and if it is 
better to use the total size or the number of elements.
Now if someone would to some meaningful benchmarks (on real 
applications), then I would be very happy, otherwise I would just take 
one of my guesses...

Fawzi




More information about the Digitalmars-d mailing list