Memory leak with dynamic array

Mon Apr 12 09:03:38 PDT 2010

Thanks to all again for the discussion, examples and explanations. :-)

One note -- I wouldn't want anyone to think I'm bashing D or complaining.
I've been interested in the language for some time and this seemed an
opportune time to start experimenting properly.  It's fun, I'm learning
a lot, and I'm genuinely touched by the amount of effort put in by
everyone on this list to teach and share examples.

I'm also fully aware that D is still growing, and that I need to be
patient in some cases ... :-)

bearophile wrote:
> D dynamic arrays are more flexible than C++ vector, they can be sliced,
> such slicing is O(1), and the slices are seen by the language just like
> other arrays. So you pay the price of some performance for such
> increased flexibility. The idea here is that the built-in data types
> must be as flexible as possible even if their performance is not so
> high, so they can be used for many different purposes.

No complaint there. :-)

> Then D standard library will have specialized data structures that are
> faster thanks to being more specialized and less flexible.

In my case -- I'm turning into 'Mr C++' again -- probably that's often
what I need.  If I look at the major benefits I found in moving from C
to C++, the first was memory management that was as automatic as I required.
For example, C++ vectors are great because they do away with having to
put in malloc/realloc/free statements and let you treat dynamic arrays
pretty much as 'just another variable'.

Within my own needs I've not yet found a case where the kind of smart GC
functionality discussed on this thread seemed necessary, but of course
I've never had it available to use before ... :-)

> In D dynamic arrays some of the performance price is also paid for the
> automatic memory management, for the GC that's not a precise GC (for
> example if your array has some empty items at the end past its true
> length, the GC must ignore them).

An idea was floating in my head about whether it is/could be possible to
turn off GC safety features in a scope where they are unnecessary --
rather like a more general version of the 'assumeSafeAppend' function...

> With LDC (once we'll have a D2 version of it) the performance of D2
> can probably be the same as the C++. DMD maybe loses a little here
> because it's not so good at inlining, or maybe because the C++ vector
> is better than this D2 code.

I thought dev effort was now focusing back on GDC ... ? :-P

I have actually not made much use of the -inline function because in
the code I wrote (maybe not best suited to inlining...), it made the
program generally run slower ...

Steven Schveighoffer wrote:
> The C++ example is reallocating memory, freeing memory it is no longer
> using.  It also manually handles the memory management, allocating larger
> and larger arrays in some algorithmically determined fashion (for example,
> multiplying the length by some constant factor).  This gives it an edge in
> performance because it does not have to do any costly lookup to determine
> if it can append in place, plus the realloc of the memory probably is
> cheaper than the GC realloc of D.

Right.  In fact you get precisely 24 allocs/deallocs, each doubling the
memory reserve to give a total capacity of 2^23 -- and then that memory is
there and can be used for the rest of the 100 iterations of the outer loop.
The shock for me was finding that D wasn't treating the memory like this
but was preserving each loop's memory (as you say, for good reason).

> D does not assume you stopped caring about the memory being pointed to
> when it had to realloc. [...] You can't do the same thing with C++
> vectors, when they reallocate, the memory they used to own could be
> freed.  This invalidates all pointers and iterators into the vector,
> but the language doesn't prevent you from having such dangling pointers.

I have a vague memory of trying to do something exactly like your example
when I was working with C++ for the first time, and getting bitten on the
arse by exactly the problem you describe.  I wish I could remember where.
I know that I found another (and possibly better) solution to do what I
wanted, but it would be nice to see if a D-ish solution would give me
something good.

> This must be fixed, the appender should be blazingly fast at appending
> (almost as fast as C++), with the drawback that the overhead is higher.

Overhead = memory cost?  I'm not so bothered as long as the memory stays
within constant, predictable bounds.  It was the memory explosion that
scared me.  And I suspect I'd pay a small performance cost (though it
would have to be small) for the kind of safety and flexibility the arrays
have.

> You haven't done much with it yet.  When you start discovering how much D
> takes care of, you will be amazed :)

I know. :-)

My needs are in some ways quite narrow -- numerical simulations in
interdisciplinary physics -- hence the C background, and hence the premium
on performance.  They're also not very big programs -- simple enough for me
to generally keep a personal overview on the memory management, even though
with C++ that's usually all taken care of automatically (no new or delete
statements if I can avoid it).

What I'm fairly confident about is that, given not too much time, D will
become a _far_ preferable language for that kind of development.

> The thing about D is it *can* be fast and unsafe, just as fast and unsafe
> as C, but that's not the default.

That's apparent -- I mean, given that D wraps the whole C standard library,
I could basically write C code in D if I wanted, no?  But of course it would
have all the notational complexities of C, which is what I'd like to escape
from ... :-P