D's memory-hungry templates

tsbockman via Digitalmars-d digitalmars-d at puremagic.com
Thu Jun 9 07:46:12 PDT 2016


While working on a small PR 
(https://github.com/dlang/phobos/pull/4420), I noticed that D's 
template computation system has horrific memory consumption (as 
well as being very slow).

I believe there are several reasons for this:

1) All template instantiations are memoized, even if they're just 
private internal implementation details. This causes the space 
complexity for recursive algorithms to generally be just as bad 
as the time complexity, whereas it should usually be much better.

2) Everything is immutable, which sometimes forces O(N) array 
algorithms to be replaced with O(N log(N)) tree algorithms. This 
compounds with (1).

3) Even though D *requires* that all template algorithms be 
recursive, the recursion limit is actually set very low (500). 
This forces efficient linear recursion O(N) algorithms to be 
replaced with wasteful O(N log(N)) binary recursion, unless N is 
guaranteed to be very small. This compounds with (1), and 
sometimes (2).

The combination of these issues causes very severe memory 
consumption and speed problems; as an example 
`staticSort!(aliasSeq!(iota(N)))`, which should have a time 
complexity of O(N log(N)) and a space complexity of O(N), instead 
seems to have a complexity of O(N^2 log(N)) for both time and 
space. This is awful - worse than any normal sorting algorithm, 
despite the fact that `staticSort` is based on the 
normally-very-efficient "merge sort" algorithm.

On my system, for N = 450 the sort takes about 300 ms and 
consumes 120 MB of memory. (With my pull request this is reduced 
to 80 ms and 35 MB, but that's still terrible.)

Ultimately, I believe it was a mistake for D to implement a 
separate, inferior programming language just for templates. 
However, it is too late to change that now (at least for D2), so 
I will offer some suggestions as to how memory consumption can be 
reduced within the current design:

A) Members of a template instantiation should be eagerly 
evaluated. Once a template has been fully evaluated, any private 
member which is not referenced by a public one should be deleted.

B) Template instantiations should be deleted after they are no 
longer accessible (even indirectly) via a top-level declaration.

C) The compiler should store `T...` in such a way that 
recursively appending N items to the beginning or end of an 
AliasSeq does not allocate more than O(N log(N)) additional 
memory. (O(N) is possible with more indirections.)

D) Implement tail recursion optimization for templates. Tail 
recursion should not count toward the recursion depth limit.

E) Consider eliminating the recursion limit entirely. Given that 
the template system is Turing Complete and mandates heavy use of 
recursion, there is no reason to think that a depth of 500 means 
something has gone wrong, any more than for a `while` loop 
running 500+ iterations. (Implementing this may require fixing 
the exponential name growth, though.)

Implementing A, B, and C should get D's template memory 
consumption under control.
Implementing D and E will make template computation more 
flexible, encouraging people to use it more and find new things 
to complain about.  :-P

Thoughts?


More information about the Digitalmars-d mailing list