Example of Why Reference Counting is Important

Thu Aug 10 13:33:21 UTC 2023

**TL;DR**: Reference Counting is a frequent topic of discussion 
in the D community. The motivations aren't always clear. In the 
context of machine learning, reference counting is valuable due 
to the low memory capacity of graphics cards, and the need to 
free unused memory as quickly as possible.

Every year, especially at DConf, there's a lot of talk about 
various approaches and difficulties regarding Reference Counting. 
Without context, there's plenty of pros and cons associated with 
it 
(https://en.wikipedia.org/wiki/Garbage_collection_%28computer_science%29#Reference_counting), however, the reasons to go with reference counting over tracing or escape analysis weren't very clear to me.

However, in the course of my work, I came across a clear use-case 
that others might find illuminating. The context is machine 
learning, where large numbers of partial derivatives of an error 
function, in the context of hundreds of thousands, if not 
millions, of parameters. Due to the high number of matrix 
operations involved, this work is primarily vectorized and 
executed on graphics cards.

Towards this end, I came across this snippet in a paper on a well 
known machine learning library called PyTorch 
(https://pytorch.org/).

https://openreview.net/pdf?id=BJJsrmfCZ
> **Memory management** The main use case for PyTorch is training 
> machine learning models on
> GPU. As one of the biggest limitations of GPUs is low memory 
> capacity, PyTorch takes great care to
> make sure that all intermediate values are freed as soon as 
> they become unneeded. Indeed, Python is
> well-suited for this purpose, because it is reference counted 
> by default (using a garbage collector only
> to break cycles).
> PyTorch’s Variable and Function must be designed to work well 
> in a reference counted regime.
> For example, a Function records pointers to the Function which 
> consumes its result, so that a
> Function subgraph is freed when its retaining output Variable 
> becomes dead. This is opposite of the
> conventional ownership for closures, where a closure retains 
> the closures it invokes (a pointer to the
> Function which produces its result.)

I hope this provides some clarity and insight to others.