Just another question about memory management in d from a newbie

Mon Jun 17 20:26:28 UTC 2019

On Mon, Jun 17, 2019 at 07:53:52PM +0000, Thomas via Digitalmars-d-learn wrote:
[...]
> int main()
> {
>     foreach(i;0 .. 10000)
>     {
>         int x;
>         // do something with x
>     }
>     return 0;
> }
> 
> Do I understand it right that the variable x will be created 10000
> times and destroyed at the end of the scope in each loop ? Or will it
> be 10000 overwritten by creation ?

If x were a heap-allocated object, then your concerns would be true: it
would be allocated once every iteration (and also add to the garbage
that the GC will have to collect later).

However, in this case x is an int, which is a value type. This means it
will be allocated on the stack, and allocation is as trivial as bumping
the stack pointer (practically zero cost), and deallocation is as simple
as bumping the stack pointer the other way. In fact, it doesn't even
have to bump the stack pointer between iterations, since it's obvious
from code analysis that x will always be allocated on the same position
of the stack relative to the function's call frame, so in the generated
machine code it can be as simple as just using that memory location
directly, and reusing the same location between iterations.

> I mean does it not cost the CPU some time to allocate (I know were
> talking here of nsec) but work is work. As far I know from my school
> days in assembler, allocation of memory is one of the most expensive
> instructions on the cpu. Activate memory block, find some free place,
> reserve, return pointer to caller, and so on..

That's not entirely accurate.  It depends on how the allocation is done.
Allocation on the stack is extremely cheap,and consists literally of
adding some number (the size of the allocation) to a register (the stack
pointer) -- you cannot get much simpler than that.

Also, memory allocation on the heap is not "one of the most expensive
instructions".  The reason it's expensive is because it's not a single
instruction, but an entire subroutine of instructions to manage the
heap.  There is no CPU I know of that has a built-in instruction for
heap allocation.

> In my opinion this version should perform a little better:
> 
> int main()
> {
>     int x;
>     foreach(i;0 .. 10000)
>     {
>         x = 0; // reinitialize
>         // do something with x
>     }
>     return 0;
> }
> 
> Or do I miss something and there is an optimization by the compiler to
> avoid recreating/destroying the variable x 10000 times ?
[...]

What you wrote here is exactly how the compiler will emit the machine
code given your first code example above.  Basically all modern
optimizing compilers will implement it this way, and you never have to
worry about stack allocation being slow.

The only time you will have a performance problem is if x is a
heap-allocated object. In *that* case you will want to look into reusing
previous instances of the object between iterations.  But if x is a
value type (int, or struct, etc.), you don't have to worry about this at
all.

The first version of the code you have above is the preferred one,
because it makes the code clearer.

On a related note, in modern programming languages generally you should
be more concerned about the higher-level meaning of the code than worry
about low-level details like how instructions are generated, because
generally speaking the machine code generated by the compiler is highly
transformed from the original source code, and generally will not have a
1-to-1 mapping to the higher-level logical meaning of the code, i.e.,
the first version of the code *logically* allocates x at the beginning
of each iteration of the loop and deallocates it at the end, but the
actual generated machine code elides all of that because the compiler's
optimizer can easily see that it's a stack allocation that always ends
up in the same place, so none of the allocation/deallocation actually
needs to be represented as-is in the machine code.

On another note, for performance-related concerns the general advice
these days is to write something in the most straightforward, logical
way first, and then if the performance is not good enough, **use a
profiler** to identify where the hotspots are, and optimize those.
Trying to optimize before you have real-world, profiler data that your
code is a hotspot is premature optimization, widely regarded as evil
because it usually leads to overly-convoluted code that's hard to
understand and maintain, and often actually *slower* because the way you
expressed the meaning of the code has become so obfuscated that the
optimizer can't figure out what you're actually trying to do, so it
gives up and doesn't even try to apply any optimizations that may have
benefitted the code.

(Of course, the above is with the caveat that writing "straightforward"
code only holds up to a certain extent; when it comes to algorithms, for
example, no compiler is going to change an O(n^2) algorithm into an
O(log n) one; you have to select the appropriate algorithm in the first
place in order to avoid the inevitable performance bottleneck. And there
are certain coding practices that can result in overall better
performance, but these come with experience, and should not be blindly
applied in the hopes that it will magically make your program faster. If
you have a hotspot somewhere in the program, you can optimize everything
else to hell and back and it will still be slow, whereas if you use a
profiler to identify the hotspot in the first place, a 1-line change
could have won you an 80% performance boost where long hours of blood
and sweat won you only 5%.  You need to identify and eliminate the big
hotspots first, and *then* come back and trim off the smaller,
lower-level stuff for that extra 2-5% win.  And experience shows that
generally, most predictions of where the hotspots will be are wrong.
Always use a profiler before jumping to conclusions.)

T

-- 
Those who don't understand D are condemned to reinvent it, poorly. -- Daniel N