What is the memory usage of my app?

Sat Apr 18 04:06:19 PDT 2015

On Friday, 17 April 2015 at 14:50:29 UTC, Márcio Martins wrote:
> On Friday, 17 April 2015 at 14:49:19 UTC, Márcio Martins wrote:
>> On Thursday, 16 April 2015 at 12:17:24 UTC, Adil wrote:
>>> I've written a simple socket-server app that securities (stock
>>> market shares) data and allows clients to query over them. The
>>> app starts by loading instrument information from a CSV file 
>>> into
>>> some structs, then listens on a socket responding to queries. 
>>> It
>>> doesn't mutate the data or allocate anything substantial.
>>>
>>> There are 2 main structs in the app. One stores security data,
>>> and the other groups together securities. They are defined as
>>> follows :
>>>
>>> ````
>>> __gshared Securities securities;
>>>
>>> struct Security
>>> {
>>>         string RIC;
>>>         string TRBC;
>>>         string[string] fields;
>>>         double[string] doubles;
>>>
>>>         @nogc @property pure size_t bytes()
>>>         {
>>>             size_t bytes;
>>>
>>>             bytes = RIC.sizeof + RIC.length;
>>>             bytes += TRBC.sizeof + TRBC.length;
>>>
>>>             foreach(k,v; fields) {
>>>                 bytes += (k.sizeof + k.length + v.sizeof +
>>> v.length);
>>>             }
>>>
>>>             foreach(k, v; doubles) {
>>>                 bytes += (k.sizeof + k.length + v.sizeof);
>>>             }
>>>
>>>             return bytes + Security.sizeof;
>>>         }
>>> }
>>>
>>> struct Securities
>>> {
>>>         Security[] securities;
>>>         private size_t[string] rics;
>>>
>>>         // Store offsets for each TRBC group
>>>         ulong[2][string] econSect;
>>>         ulong[2][string] busSect;
>>>         ulong[2][string] IndGrp;
>>>         ulong[2][string] Ind;
>>>
>>>         @nogc @property pure size_t bytes()
>>>         {
>>>             size_t bytes;
>>>
>>>             foreach(Security s; securities) {
>>>                 bytes += s.sizeof + s.bytes;
>>>             }
>>>
>>>             foreach(k, v; rics) {
>>>                 bytes += k.sizeof + k.length + v.sizeof;
>>>             }
>>>
>>>             foreach(k, v; econSect) {
>>>                 bytes += k.sizeof + k.length + v.sizeof;
>>>             }
>>>
>>>             foreach(k, v; busSect) {
>>>                 bytes += k.sizeof + k.length + v.sizeof;
>>>             }
>>>
>>>             foreach(k, v; IndGrp) {
>>>                 bytes += k.sizeof + k.length + v.sizeof;
>>>             }
>>>
>>>             foreach(k, v; Ind) {
>>>                 bytes += k.sizeof + k.length + v.sizeof;
>>>             }
>>>
>>>             return bytes + Securities.sizeof;
>>>         }
>>> }
>>> ````
>>>
>>> Calling Securities.bytes shows "188 MB", but "ps" shows about 
>>> 591
>>> MB of Resident memory. Where is the memory usage coming from?
>>> What am i missing?
>>
>> After a quick look, it seems like you are only count the 
>> fields memory in the associative arrays, but forgetting about 
>> the internal data structure memory - this is a common mistake.
>>
>> Depending on D's associative array implementation and growth 
>> policies, (which I am not familiar with, yet), you might be 
>> paying a lot of overhead from having so many of them, all of 
>> them holding relatively small types,
>> which make the overhead/payload ratio very bad.
>> Unfortunately, to my knowledge, there is no way to query the 
>> current capacity or load factor of an AA.
>>
>> If I am reading druntime's code correctly, if your hash table 
>> contains at least five elements, you are already paying at 
>> least for sizeof(void*) * 31. The 31 grows based on predefined 
>> prime number list you can see here: 
>> https://github.com/D-Programming-Language/druntime/blob/master/src/rt/aaA.d#L36
>>
>> I hope you can see how this overhead is gigantic for your 
>> case, when you're mapping string -> double, or string -> 
>> ulong[2]
>>
>> In addition, each allocation on the runtime heap incurs a 
>> booking keeping cost of at least one pointer size, *often 
>> more*, and a lot of times an addition extra padding cost for 
>> alignment requirements.
>>
>> There are a few more hidden costs that you can't easily avoid 
>> or even calculate from within your binary that you will see in 
>> the size the OS reports.
>>
>> The solution in your case is to use more flat arrays and less 
>> AAs.
>> AAs are not a silver bullet! Sometimes it's faster to do 
>> linear/binary search in a contiguous block of an array than to 
>> search through an AA. This is very often the case for D's 
>> current AA implementation.
>>
>> Rant: I think D's associative array implementation is pretty 
>> bad for such an integral and often used part of the language. 
>> Mostly due to it being implemented in the runtime, as opposed 
>> to being an inlineable library template, but also because it's 
>> using an old-school linked-list approach which is pretty bad 
>> for you CPU caches. I generally roll my own hash tables for 
>> perf sensitive scenarios, which are more cpu efficient and 
>> almost always also more memory efficient.
>>
>>
>> Sorry for the wall of text! I thought I'd elaborate a bit more 
>> since I rarely see these hidden costs mentioned anywhere, in 
>> addition to a general overuse of AAs.
>
> Sorry for the poor grammar - I hate it that I can't edit posts 
> :P

Thank you for your insight Marcio. That was helpful. I'm inclined
to agree with you. I noticed some strange behaviour on behalf of
the garbage collecter as well. If i run GC.collect() once every
1000 iterations, it seems to shave off 200 Mb of mem usage!
That's a lot! I experimented with collecting at higher and lower
frequencies, one collection/1000 iterations seemed to get the
most savings while keeping load times acceptable.

This suggests, as you said, lots of small allocations, but also
that they're not being reclaimed when a GC.collect cycle is run.
Not reliably anyway. This is a bit disappointing, but i guess the
GC is a WIP. I'm afraid i have no knowledge of the GC to talk
intelligently about it any further.

+1 for looking up the druntime source! Reminder to self: check
source of open source project :)