Memory leak with dynamic array

Mon Apr 12 09:35:47 PDT 2010

On Mon, 12 Apr 2010 12:03:38 -0400, Joseph Wakeling  
<joseph.wakeling at gmail.com> wrote:

> I thought dev effort was now focusing back on GDC ... ? :-P

AFAIK, gdc hasn't been actively developed for a few years.

ldc, on the other hand, has regular releases.  I think ldc may be the  
future of D compilers, but I currently use dmd since I'm using D2.

> Steven Schveighoffer wrote:
>> The C++ example is reallocating memory, freeing memory it is no longer
>> using.  It also manually handles the memory management, allocating  
>> larger
>> and larger arrays in some algorithmically determined fashion (for  
>> example,
>> multiplying the length by some constant factor).  This gives it an edge  
>> in
>> performance because it does not have to do any costly lookup to  
>> determine
>> if it can append in place, plus the realloc of the memory probably is
>> cheaper than the GC realloc of D.
>
> Right.  In fact you get precisely 24 allocs/deallocs, each doubling the
> memory reserve to give a total capacity of 2^23 -- and then that memory  
> is
> there and can be used for the rest of the 100 iterations of the outer  
> loop.
> The shock for me was finding that D wasn't treating the memory like this
> but was preserving each loop's memory (as you say, for good reason).

Yes, you get around this by preallocating.

>> D does not assume you stopped caring about the memory being pointed to
>> when it had to realloc. [...] You can't do the same thing with C++
>> vectors, when they reallocate, the memory they used to own could be
>> freed.  This invalidates all pointers and iterators into the vector,
>> but the language doesn't prevent you from having such dangling pointers.
>
> I have a vague memory of trying to do something exactly like your example
> when I was working with C++ for the first time, and getting bitten on the
> arse by exactly the problem you describe.  I wish I could remember where.
> I know that I found another (and possibly better) solution to do what I
> wanted, but it would be nice to see if a D-ish solution would give me
> something good.

It's often these types of performance discrepancies that critics point to  
(not that you are a critic), but it's the cost of having a more  
comprehensive language.  Your appetite for the sheer performance of a  
language will sour once you get bit by a few of these nasty bugs.

But D fosters a completely different way of thinking about solving  
problems.  One problem with C++'s vector is it is a value type -- you must  
pass a reference in order to avoid copying an entire vector.  However, D's  
arrays are a hybrid between reference and value type.  Often, once you set  
data in a vector/array, you never change it again.  D allows ways to  
enforce this (i.e. immutable) and also allows you to pass around "slices"  
of your array with zero overhead (no copying).  It results in some  
extremely high-performance code, which wouldn't be easy, or maybe even  
possible, with C++.

Take for instance a split function.  In C++, I'd expect split(string x) to  
return a vector<string>.  However, vector<string> makes a copy of each  
part of the string it has split out.  D, however, can return references to  
the original data (slices), which consume no overhead.  The only extra  
space allocated is the array to hold the string references.  All this is  
also completely safe!

You could then even modify the original string (assuming you were not  
using immutable strings) in place!  Or append to any one of the strings in  
the array safely.

>> This must be fixed, the appender should be blazingly fast at appending
>> (almost as fast as C++), with the drawback that the overhead is higher.
>
> Overhead = memory cost?  I'm not so bothered as long as the memory stays
> within constant, predictable bounds.  It was the memory explosion that
> scared me.  And I suspect I'd pay a small performance cost (though it
> would have to be small) for the kind of safety and flexibility the arrays
> have.

Overhead = bigger initialization cost, memory footprint.  It's not  
important if you are building a large array (which is what appender should  
be for), but the cost would add up if you had lots of little appenders  
that you didn't append much to.  The point is, the builtin array optimizes  
performance for operations besides append, but allows appending as a  
convenience.  Appender should optimize appending, sacrificing performance  
in other areas.  It all depends on your particular application whether you  
should use appender or builtin arrays (or something entirely  
different/custom).

>> You haven't done much with it yet.  When you start discovering how much  
>> D
>> takes care of, you will be amazed :)
>
> I know. :-)
>
> My needs are in some ways quite narrow -- numerical simulations in
> interdisciplinary physics -- hence the C background, and hence the  
> premium
> on performance.  They're also not very big programs -- simple enough for  
> me
> to generally keep a personal overview on the memory management, even  
> though
> with C++ that's usually all taken care of automatically (no new or delete
> statements if I can avoid it).

There are many in the community that use D for numerical stuff.  It's  
definitely not as mature as it could be, but getting better.  Don is  
adding a lot of cool stuff to it, including a builtin exponent operator  
and arbitrary precision numbers.

>> The thing about D is it *can* be fast and unsafe, just as fast and  
>> unsafe
>> as C, but that's not the default.
>
> That's apparent -- I mean, given that D wraps the whole C standard  
> library,
> I could basically write C code in D if I wanted, no?

Yes, but that's not what I meant ;)  I mean, you can write your own types,  
like the Appender (or what the appender *should* be) that optimize the  
behavior of code to meet any needs.  And it can do it with a much better  
syntax than C.  D's template system and ability to make user-types seem  
like builtins I think is unparalleled in C-like languages.

-Steve