[Issue 5813] New: [patch] std.array.Appender has severe performance and memory leak problems.

d-bugmail at puremagic.com d-bugmail at puremagic.com
Sun Apr 3 20:05:39 PDT 2011


http://d.puremagic.com/issues/show_bug.cgi?id=5813

           Summary: [patch] std.array.Appender has severe performance and
                    memory leak problems.
           Product: D
           Version: D2
          Platform: Other
        OS/Version: Windows
            Status: NEW
          Keywords: patch, performance
          Severity: normal
          Priority: P3
         Component: Phobos
        AssignedTo: nobody at puremagic.com
        ReportedBy: sandford at jhu.edu
        Depends on: 5233,5780


--- Comment #0 from Rob Jacques <sandford at jhu.edu> 2011-04-03 20:01:59 PDT ---
std.array.Appender currently suffers from a major memory leak, and consequently
has a major performance problem. The primary problem comes from the fact that
Appender isn't sealed, and must let the GC collect the unused arrays it
allocates. Even in a micro-benchmark program, false pointers can lead to excess
memory being kept live. For example, a micro-benchmark which simply generates a
10MB array of reals, used anywhere between 30-116MB of ram (50 seemed about
average). For me, the major issue with Appender has been the tendency for
generating a 150MB array to run my program out of memory (and to take minutes
even when it succeeds). 

Recently, bearophile posted a set of micro-benchmarks
(http://www.digitalmars.com/d/archives/digitalmars/D/Conversion_to_string_string_building_benchmark_132129.html)
I'll highlight two results, test1 which appended to!string(uint) and test2
which appended "12345678" took 7.38 / 5.91 seconds, respectively, using the old
appender. With this patch, the times on my machine are 9.0 / 0.40 seconds,
respectively. Memory usage also decreased from 290 MB to 146 MB with this
patch.

Besides better large-scale performance, I wrote some micro-benchmarks to test
usage as a small buffer and small array builder (~ 100 elements). I saw a minor
performance increase of 3% and 10%, respectively.

Needly to say, I've seen a dramatic performance increase in my production code,
quite literally from minutes to milliseconds.

Implementation changes  
=======================================================
I've changed the basic architecture of Appender from a 'smart' array to a
linked-list of arrays. data(), therefore, has to combine these segments to
return an array. If data() performs a new allocation & copy, this array is used
to create a new head node, thus preventing repeated calls to data from
allocating additional arrays. 

The growth schedule has also changed, now the capacity of each segment is
doubled on growth, this leads to a greater than 2x length schedule, while the
length is smallish. I've limited the capacity based growth to 4 KB, i.e. 1
memory page. As no copying is done on growth, the only real cost I need to
amortize is allocation, and I felt 4 Kb was a good trade-off point between
keeping memory tight and allocation cost.  

For performance, I found that I shouldn't support a current node pointer, and
instead should use a tail node pointer. A side effect of this optimization is
that shrinkTo may drop previously allocated segments and clear only retains the
last node, as it is assumed to be the largest.

RefAppender's was updated to support the changes in Appender cleanly.

I've added some additional methods to the API: 
-slack: returns the number of elements that can be added before allocation.
-extend: Increases the slack by at least N elements
-dup: Returns: a mutable copy of the data.
-idup: Returns: a immutable copy of the data.
-this(size_t N): Construct an appender with a capacity of at least N elements


[patch] Don't forget the dependent patches (bugs 5780, 5233)  
=================================================================================

/** Implements an output range which stores appended data. This is recommended
 *  over $(D a ~= data) when appending many elements because it is more memory
 *  and computationally efficient.
 *
 *  Example:
    ----
    auto buffer = Appender!(char[])(4);             // Reserve 4 elements
    foreach( str; ["Hello"d, "World"d] ) {
        buffer.clear;                               // Use appender as a
        foreach( dchar c; str )                     // dynamically sized buffer
            put( buffer, c    );
        assert( buffer.data == str );
    }

    int[] a = [ 1, 2 ];
    auto builder = appender(a);                     // Or for array building
    builder.put(3);
    builder.put([ 4, 5, 6 ]);
    assert(builder.data == [ 1, 2, 3, 4, 5, 6 ]);
    ----
 */
struct Appender(A : T[], T) {
    private {
        enum  PageSize = 4096;          // Memory page size
        alias Unqual!T E;               // Internal element type

        struct Data {
            Data*       next;           // The next data segment
            size_t      capacity;       // Capacity of this segment
            E[]         arr;            // This segment's array

            // Initialize a segment using an existing array
            void opAssign(E[] _arr) {
                next           = null;
                capacity       = _arr.capacity;
                arr            = _arr;
                if(_arr.length < capacity) {
                    arr.length = capacity;
                    arr.length = _arr.length;
                }
                assert(_arr.ptr is arr.ptr,"Unexpected reallocation occurred");
            }

            // Create a new segment using an existing array
            this(Unqual!T[] _arr) { this = _arr; }

            // Create a new segment with at least size bytes
            this(size_t size) {
                if(size > PageSize)
                    size = (size +  PageSize-1) & ~(PageSize-1);
                auto bi  = GC.qalloc(size, !hasIndirections!T * 2);
                next     = null;
                capacity = bi.size / T.sizeof;
                arr      = (cast(E*)bi.base)[0..0];
                static assert(!false*2 == GC.BlkAttr.NO_SCAN);
            }
        }
        Data*  _head = null;                   // The head data segment
        Data*  _tail = null;                   // The last data segment

        // Returns: the total number of elements in the appender
        size_t _length() {
            size_t len = 0;
            for(auto d = _head; d !is null; d = d.next)
                len   += d.arr.length;
            return len;
        }

        // Flatten all the data segments into a single array
        E[] flatten() {
            if(_head is _tail)
                return _head ? _head.arr : null;

            size_t N   = _length;
            size_t len = N;
            size_t i   = 0;
            auto arr   = new E[N];
            for(auto d = _head; N > 0; d = d.next, N -= len, i += len) {
                len    = min(N, d.arr.length);
                memcpy(arr.ptr+i, d.arr.ptr, len * T.sizeof);
            }
            return arr;
        }

        // Returns: the next capacity size
        size_t nextCapacity() nothrow pure {
            auto   cap = _tail.capacity * T.sizeof * 2;
            return cap < PageSize ? cap : PageSize;
        }
    }

    /** Construct an appender with a given array.  Note that this does not copy
     *  the data.  If the array has a larger capacity as determined by
     *  arr.capacity, it will be used by the appender.  After initializing an
     *  appender on an array, appending to the original array will reallocate.
     */
    this(T[] arr) {
        if(arr is null) _head = _tail = new Data( 16 * T.sizeof );
        else            _head = _tail = new Data( cast(E[]) arr );
    }

    /// Construct an appender with a capacity of at least N elements.
    this(size_t N) { _head = _tail = new Data( N * T.sizeof ); }

    /// Returns: a mutable copy of the data.
    E[] dup()  {
        return _head !is _tail ? flatten : flatten.dup;
    }

    /// Returns: a immutable copy of the data.
    immutable(E)[] idup() {
        return cast(immutable(E)[]) dup;
    }

    /// Returns: the appender's data as an array.
    T[] data() {
        auto arr = flatten;
        if(_head !is _tail) {
            *_head = arr;
             _tail = _head;
        }
        return cast(T[]) arr;
    }

    /// Returns: the number of elements that can be added before allocation.
    size_t slack() {
        return _tail ? _tail.capacity - _tail.arr.length : 0;
    }

    /// Increases the slack by at least N elements
    void extend(size_t N) {
        assert( size_t.max / T.sizeof  >= N, "Capacity overflow.");
        auto size = N * T.sizeof;

        // Initialize if not done so.
        if(_tail) return _head = _tail = new Data( size );

        // Try extending
        if( auto u = GC.extend(_tail.arr.ptr, size, size) )
            return _tail.capacity = u / T.sizeof;

        // If full, add a segment
        if(_tail.arr.length == _tail.capacity)
             return _tail = _tail.next = new Data( size );

        // Allocate & copy
        auto next = Data(size);
        memcpy(next.arr.ptr, _tail.arr.ptr, _tail.arr.length * T.sizeof);
        _tail.arr       = next.arr.ptr[0.._tail.arr.length];
        _tail.capacity  = next.capacity;
    }

    /// Returns: the total number of elements currently allocated.
    size_t capacity() {
        size_t cap = 0;
        for(auto d = _head; d !is null; d = d.next)
            cap   += d.capacity;
        return cap;
    }

    /// Ensures that the capacity is a least newCapacity elements.
    void reserve(size_t newCapacity) {
        auto cap  = capacity;
        if(  cap >= newCapacity) return;
        extend( newCapacity - cap );
    }

    /// Appends to the output range
    void put(U)(U item) if ( isOutputRange!(Unqual!T[],U) ){
        // put(T)
        static if ( isImplicitlyConvertible!(U, E) ){
            if(!_head)
                _head = _tail  = new Data( 16 * T.sizeof );
            else if( _tail.arr.length == _tail.capacity  ) {   // Try extending
                if( auto u = GC.extend(_tail.arr.ptr, T.sizeof, nextCapacity) )
                     _tail.capacity     = u / T.sizeof;
                else _tail = _tail.next = new Data( nextCapacity );
            }
            auto          len  = _tail.arr.length;
            _tail.arr.ptr[len] = item;
            _tail.arr          = _tail.arr.ptr[0 .. len + 1];

        // fast put(T[])
        } else static if( is(typeof(_tail.arr[0..1] = item[0..1])) ){
            auto items  = cast(E[]) item[];
            if(!_tail)
                _head   = _tail = new Data(  items.length * T.sizeof );
            auto arr    = _tail.arr.ptr[_tail.arr.length.._tail.capacity];
            size_t len  = items.length;
            if(arr.length < len) {                             // Try extending
                auto size  = max(items.length*T.sizeof, nextCapacity);
                if( auto u = GC.extend(_tail.arr.ptr, T.sizeof, size) ) {
                    _tail.capacity = u / T.sizeof;
                    arr    = _tail.arr.ptr[_tail.arr.length.._tail.capacity];
                }
                if(arr.length < len) len = arr.length;
            }
            arr[0..len] = items[0..len];
            items       = items[len..$];
            _tail.arr   = _tail.arr.ptr[0 .. _tail.arr.length + len];
            if( items.length > 0 ) {               // Add a segment and advance
                _tail.next = new Data(max(items.length*T.sizeof,nextCapacity));
                _tail      = _tail.next;
                _tail.arr.ptr[0..items.length] = items[];
                _tail.arr   = _tail.arr.ptr[0..items.length];
            }

        // Everything else
        } else {
            .put!(typeof(this),U,true,Unqual!T)(this,item);
        }
    }

    // only allow overwriting data on non-immutable and non-const data
    static if(!is(T == immutable) && !is(T == const)) {
        /** Clears the managed array. This function may reduce the appender's
         *  capacity.
         */
        void clear() {
            _head     = _tail;            // Save the largest chunk and move on
            _tail.arr = _tail.arr.ptr[0..0];
        }

        /** Shrinks the appender to a given length. Passing in a length that's
         *  greater than the current array length throws an enforce exception.
         *  This function may reduce the appender's capacity.
         */
        void shrinkTo(size_t newlength) {
            for(auto d = _head; d !is null; d = d.next) {
                if(d.arr.length >= newlength) {
                    d.arr  = d.arr.ptr[0..newlength];
                    d.next = null;
                }
                newlength -= d.arr.length;
            }
            enforce(newlength == 0, "Appender.shrinkTo: newlength > capacity");
        }
    }
}

/** An appender that can update an array in-place.  It forwards all calls to an
 *  underlying appender implementation.  Calling data updates the pointer to
 *  the original array passeed in.
 */
struct RefAppender(A : T[], T) {
    private {
        Appender!(A, T) impl;
        T[] *arr;
    }

    /** Construct a ref appender with a given array reference.  This does not
     *  copy the data.  If the array has a larger capacity as determined by
     *  arr.capacity, it will be used by the appender.  $(D RefAppender)
     *  assumes that arr is a non-null value.
     *
     *  Note, do not use builtin appending (i.e. ~=) on the original array
     *  passed in until you are done with the appender, because calls to the
     *  appender override those appends.
    */
    this(T[] *arr) {
        impl     = Appender!(A, T)(*arr);
        this.arr = arr;
    }

    auto opDispatch(string fn, Args...)(Args args)
        if (is(typeof(mixin("impl." ~ fn ~ "(args)"))))  {
        mixin("return impl." ~ fn ~ "(args);");
    }

    /// Returns the appender's data as an array and updates the original array.
    T[] data() {
         auto  arr = impl.data;
         *this.arr = arr;
        return arr;
    }
}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------


More information about the Digitalmars-d-bugs mailing list