What is the memory usage of my app?

Fri Apr 17 07:49:18 PDT 2015

On Thursday, 16 April 2015 at 12:17:24 UTC, Adil wrote:
> I've written a simple socket-server app that securities (stock
> market shares) data and allows clients to query over them. The
> app starts by loading instrument information from a CSV file 
> into
> some structs, then listens on a socket responding to queries. It
> doesn't mutate the data or allocate anything substantial.
>
> There are 2 main structs in the app. One stores security data,
> and the other groups together securities. They are defined as
> follows :
>
> ````
> __gshared Securities securities;
>
> struct Security
> {
>           string RIC;
>           string TRBC;
>           string[string] fields;
>           double[string] doubles;
>
>           @nogc @property pure size_t bytes()
>           {
>               size_t bytes;
>
>               bytes = RIC.sizeof + RIC.length;
>               bytes += TRBC.sizeof + TRBC.length;
>
>               foreach(k,v; fields) {
>                   bytes += (k.sizeof + k.length + v.sizeof +
> v.length);
>               }
>
>               foreach(k, v; doubles) {
>                   bytes += (k.sizeof + k.length + v.sizeof);
>               }
>
>               return bytes + Security.sizeof;
>           }
> }
>
> struct Securities
> {
>           Security[] securities;
>           private size_t[string] rics;
>
>           // Store offsets for each TRBC group
>           ulong[2][string] econSect;
>           ulong[2][string] busSect;
>           ulong[2][string] IndGrp;
>           ulong[2][string] Ind;
>
>           @nogc @property pure size_t bytes()
>           {
>               size_t bytes;
>
>               foreach(Security s; securities) {
>                   bytes += s.sizeof + s.bytes;
>               }
>
>               foreach(k, v; rics) {
>                   bytes += k.sizeof + k.length + v.sizeof;
>               }
>
>               foreach(k, v; econSect) {
>                   bytes += k.sizeof + k.length + v.sizeof;
>               }
>
>               foreach(k, v; busSect) {
>                   bytes += k.sizeof + k.length + v.sizeof;
>               }
>
>               foreach(k, v; IndGrp) {
>                   bytes += k.sizeof + k.length + v.sizeof;
>               }
>
>               foreach(k, v; Ind) {
>                   bytes += k.sizeof + k.length + v.sizeof;
>               }
>
>               return bytes + Securities.sizeof;
>           }
> }
> ````
>
> Calling Securities.bytes shows "188 MB", but "ps" shows about 
> 591
> MB of Resident memory. Where is the memory usage coming from?
> What am i missing?

After a quick look, it seems like you are only count the fields 
memory in the associative arrays, but forgetting about the 
internal data structure memory - this is a common mistake.

Depending on D's associative array implementation and growth 
policies, (which I am not familiar with, yet), you might be 
paying a lot of overhead from having so many of them, all of them 
holding relatively small types,
which make the overhead/payload ratio very bad.
Unfortunately, to my knowledge, there is no way to query the 
current capacity or load factor of an AA.

If I am reading druntime's code correctly, if your hash table 
contains at least five elements, you are already paying at least 
for sizeof(void*) * 31. The 31 grows based on predefined prime 
number list you can see here: 
https://github.com/D-Programming-Language/druntime/blob/master/src/rt/aaA.d#L36

I hope you can see how this overhead is gigantic for your case, 
when you're mapping string -> double, or string -> ulong[2]

In addition, each allocation on the runtime heap incurs a booking 
keeping cost of at least one pointer size, *often more*, and a 
lot of times an addition extra padding cost for alignment 
requirements.

There are a few more hidden costs that you can't easily avoid or 
even calculate from within your binary that you will see in the 
size the OS reports.

The solution in your case is to use more flat arrays and less AAs.
AAs are not a silver bullet! Sometimes it's faster to do 
linear/binary search in a contiguous block of an array than to 
search through an AA. This is very often the case for D's current 
AA implementation.

Rant: I think D's associative array implementation is pretty bad 
for such an integral and often used part of the language. Mostly 
due to it being implemented in the runtime, as opposed to being 
an inlineable library template, but also because it's using an 
old-school linked-list approach which is pretty bad for you CPU 
caches. I generally roll my own hash tables for perf sensitive 
scenarios, which are more cpu efficient and almost always also 
more memory efficient.

Sorry for the wall of text! I thought I'd elaborate a bit more 
since I rarely see these hidden costs mentioned anywhere, in 
addition to a general overuse of AAs.