What is the memory usage of my app?

Fri Apr 17 07:50:28 PDT 2015

On Friday, 17 April 2015 at 14:49:19 UTC, Márcio Martins wrote:
> On Thursday, 16 April 2015 at 12:17:24 UTC, Adil wrote:
>> I've written a simple socket-server app that securities (stock
>> market shares) data and allows clients to query over them. The
>> app starts by loading instrument information from a CSV file 
>> into
>> some structs, then listens on a socket responding to queries. 
>> It
>> doesn't mutate the data or allocate anything substantial.
>>
>> There are 2 main structs in the app. One stores security data,
>> and the other groups together securities. They are defined as
>> follows :
>>
>> ````
>> __gshared Securities securities;
>>
>> struct Security
>> {
>>          string RIC;
>>          string TRBC;
>>          string[string] fields;
>>          double[string] doubles;
>>
>>          @nogc @property pure size_t bytes()
>>          {
>>              size_t bytes;
>>
>>              bytes = RIC.sizeof + RIC.length;
>>              bytes += TRBC.sizeof + TRBC.length;
>>
>>              foreach(k,v; fields) {
>>                  bytes += (k.sizeof + k.length + v.sizeof +
>> v.length);
>>              }
>>
>>              foreach(k, v; doubles) {
>>                  bytes += (k.sizeof + k.length + v.sizeof);
>>              }
>>
>>              return bytes + Security.sizeof;
>>          }
>> }
>>
>> struct Securities
>> {
>>          Security[] securities;
>>          private size_t[string] rics;
>>
>>          // Store offsets for each TRBC group
>>          ulong[2][string] econSect;
>>          ulong[2][string] busSect;
>>          ulong[2][string] IndGrp;
>>          ulong[2][string] Ind;
>>
>>          @nogc @property pure size_t bytes()
>>          {
>>              size_t bytes;
>>
>>              foreach(Security s; securities) {
>>                  bytes += s.sizeof + s.bytes;
>>              }
>>
>>              foreach(k, v; rics) {
>>                  bytes += k.sizeof + k.length + v.sizeof;
>>              }
>>
>>              foreach(k, v; econSect) {
>>                  bytes += k.sizeof + k.length + v.sizeof;
>>              }
>>
>>              foreach(k, v; busSect) {
>>                  bytes += k.sizeof + k.length + v.sizeof;
>>              }
>>
>>              foreach(k, v; IndGrp) {
>>                  bytes += k.sizeof + k.length + v.sizeof;
>>              }
>>
>>              foreach(k, v; Ind) {
>>                  bytes += k.sizeof + k.length + v.sizeof;
>>              }
>>
>>              return bytes + Securities.sizeof;
>>          }
>> }
>> ````
>>
>> Calling Securities.bytes shows "188 MB", but "ps" shows about 
>> 591
>> MB of Resident memory. Where is the memory usage coming from?
>> What am i missing?
>
> After a quick look, it seems like you are only count the fields 
> memory in the associative arrays, but forgetting about the 
> internal data structure memory - this is a common mistake.
>
> Depending on D's associative array implementation and growth 
> policies, (which I am not familiar with, yet), you might be 
> paying a lot of overhead from having so many of them, all of 
> them holding relatively small types,
> which make the overhead/payload ratio very bad.
> Unfortunately, to my knowledge, there is no way to query the 
> current capacity or load factor of an AA.
>
> If I am reading druntime's code correctly, if your hash table 
> contains at least five elements, you are already paying at 
> least for sizeof(void*) * 31. The 31 grows based on predefined 
> prime number list you can see here: 
> https://github.com/D-Programming-Language/druntime/blob/master/src/rt/aaA.d#L36
>
> I hope you can see how this overhead is gigantic for your case, 
> when you're mapping string -> double, or string -> ulong[2]
>
> In addition, each allocation on the runtime heap incurs a 
> booking keeping cost of at least one pointer size, *often 
> more*, and a lot of times an addition extra padding cost for 
> alignment requirements.
>
> There are a few more hidden costs that you can't easily avoid 
> or even calculate from within your binary that you will see in 
> the size the OS reports.
>
> The solution in your case is to use more flat arrays and less 
> AAs.
> AAs are not a silver bullet! Sometimes it's faster to do 
> linear/binary search in a contiguous block of an array than to 
> search through an AA. This is very often the case for D's 
> current AA implementation.
>
> Rant: I think D's associative array implementation is pretty 
> bad for such an integral and often used part of the language. 
> Mostly due to it being implemented in the runtime, as opposed 
> to being an inlineable library template, but also because it's 
> using an old-school linked-list approach which is pretty bad 
> for you CPU caches. I generally roll my own hash tables for 
> perf sensitive scenarios, which are more cpu efficient and 
> almost always also more memory efficient.
>
>
> Sorry for the wall of text! I thought I'd elaborate a bit more 
> since I rarely see these hidden costs mentioned anywhere, in 
> addition to a general overuse of AAs.

Sorry for the poor grammar - I hate it that I can't edit posts :P