D's memory-hungry templates
tsbockman via Digitalmars-d
digitalmars-d at puremagic.com
Thu Jun 9 07:46:12 PDT 2016
While working on a small PR
(https://github.com/dlang/phobos/pull/4420), I noticed that D's
template computation system has horrific memory consumption (as
well as being very slow).
I believe there are several reasons for this:
1) All template instantiations are memoized, even if they're just
private internal implementation details. This causes the space
complexity for recursive algorithms to generally be just as bad
as the time complexity, whereas it should usually be much better.
2) Everything is immutable, which sometimes forces O(N) array
algorithms to be replaced with O(N log(N)) tree algorithms. This
compounds with (1).
3) Even though D *requires* that all template algorithms be
recursive, the recursion limit is actually set very low (500).
This forces efficient linear recursion O(N) algorithms to be
replaced with wasteful O(N log(N)) binary recursion, unless N is
guaranteed to be very small. This compounds with (1), and
sometimes (2).
The combination of these issues causes very severe memory
consumption and speed problems; as an example
`staticSort!(aliasSeq!(iota(N)))`, which should have a time
complexity of O(N log(N)) and a space complexity of O(N), instead
seems to have a complexity of O(N^2 log(N)) for both time and
space. This is awful - worse than any normal sorting algorithm,
despite the fact that `staticSort` is based on the
normally-very-efficient "merge sort" algorithm.
On my system, for N = 450 the sort takes about 300 ms and
consumes 120 MB of memory. (With my pull request this is reduced
to 80 ms and 35 MB, but that's still terrible.)
Ultimately, I believe it was a mistake for D to implement a
separate, inferior programming language just for templates.
However, it is too late to change that now (at least for D2), so
I will offer some suggestions as to how memory consumption can be
reduced within the current design:
A) Members of a template instantiation should be eagerly
evaluated. Once a template has been fully evaluated, any private
member which is not referenced by a public one should be deleted.
B) Template instantiations should be deleted after they are no
longer accessible (even indirectly) via a top-level declaration.
C) The compiler should store `T...` in such a way that
recursively appending N items to the beginning or end of an
AliasSeq does not allocate more than O(N log(N)) additional
memory. (O(N) is possible with more indirections.)
D) Implement tail recursion optimization for templates. Tail
recursion should not count toward the recursion depth limit.
E) Consider eliminating the recursion limit entirely. Given that
the template system is Turing Complete and mandates heavy use of
recursion, there is no reason to think that a depth of 500 means
something has gone wrong, any more than for a `while` loop
running 500+ iterations. (Implementing this may require fixing
the exponential name growth, though.)
Implementing A, B, and C should get D's template memory
consumption under control.
Implementing D and E will make template computation more
flexible, encouraging people to use it more and find new things
to complain about. :-P
Thoughts?
More information about the Digitalmars-d
mailing list