A possible leak

Vladimir Panteleev thecybershadow at gmail.com
Sat Oct 3 17:21:59 PDT 2009


On Thu, 01 Oct 2009 14:23:48 +0300, bearophile <bearophileHUGS at lycos.com>  
wrote:

> Time ago Jon Harrop has found a memory leak in a F# program running on  
> Mono, he has reduced the program to a minimal test case. I have  
> translated that code to D2 and Python2:

Sorry for the late reply. I wanted to get to the bottom of this, and  
facilitate others to do the same easily, thus this motivated me to improve  
Diamond to make that possible.

What basically happens here is that the program creates an infinite  
contiguous linked list. Even though there is no reference to the original  
node, a stray reference to any of the nodes in the list will prevent any  
nodes following it to be freed. The program's behavior was inconsistent on  
my machine (it either leaked or didn't leak memory), because the memory  
pools were allocated at different memory addresses.

*** SHAMELESS PLUG ALERT ***

Diamond ( my post-mortem memory debugger -  
http://dsource.org/projects/diamond ) should allow to diagnose these  
situations easily, more so with the latest additions. Here's an example of  
how it's possible to easily analyze this program's memory behavior and  
find the source of the leak:

Firstly, the program was backported to D1 (I don't use D2, thus Diamond  
doesn't support it ATM). Its execution time is limited, to allow a clean  
exit. The final source is as follows:

-------------------------------------------
import diamond;

struct Node {
     int data;
     Node* next;
}

void main() {
     Node* lh = new Node;
     lh.data = 1;
     lh.next = lh;

     for(int i=0;i<1000000;i++) {
         Node* lh2 = lh;
         lh = new Node;
         lh.data = 2;
         lh.next = lh2.next;
         lh2.next = lh;
         lh2 = lh;
         lh2.next = lh2.next.next;
     }
}
-------------------------------------------

The diamond runtime module was configured with only the MEMLOG and  
MEMLOG_CRC32 options.

The program was compiled and executed; the runtime module created a .mem  
file which contained a log of memory operations, and full contents for  
every garbage collect. This file was loaded into the memory log analyzer:

> Diamond Memory Log Analyzer, v0.2
> by Vladimir "CyberShadow" Panteleev, 2008-2009
>
> Using the most recent memory dump file.
> Loading diamond_2009-09-04_02.25.49.mem...
> 1000013 events loaded.
> Using the most recent map file.
> 1047 symbols loaded from test1final.map.
> > dumps
>   65023 @ 02:25:49: MEMDUMP (   256/   256/   256)
>   89503 @ 02:25:49: MEMDUMP (   256/   256/   256)
>  155041 @ 02:25:49: MEMDUMP (   512/   512/   512)
>  286115 @ 02:25:49: MEMDUMP (  1024/  1024/  1024)
>  482725 @ 02:25:50: MEMDUMP (  1792/  1792/  1792)
>  744871 @ 02:25:51: MEMDUMP (  2816/  2816/  2816)

MEMDUMP events are generated automatically before garbage collects.  
Judging by the history, the GC failed to free up space starting with the  
second collect. Let's look at the memory map before and after the first  
collect:

> > nextdump
> D65023> map
> Page map for pool 02090000 - 02190000 ( 256/ 256/ 256 pages):
> (page size = 0x1000)            +10000           +20000           +30000
> 02090000: 6744444444444444 4444444444444444 4444444444444444  
> 4444444444444444
> 020D0000: 4444444444444444 4444444444444444 4444444444444444  
> 4444444444444444
> 02110000: 4444444444444444 4444444444444444 4444444444444444  
> 4444444444444444
> 02150000: 4444444444444444 4444444444444444 4444444444444444  
> 4444444444444444
> D65023> n
> M65024> map
> Page map for pool 02090000 - 02190000 ( 162/ 256/ 256 pages):
> (page size = 0x1000)            +10000           +20000           +30000
> 02090000: 674............. ................ ................  
> ................
> 020D0000: ................ ................ .444444444444444  
> 4444444444444444
> 02110000: 4444444444444444 4444444444444444 4444444444444444  
> 4444444444444444
> 02150000: 4444444444444444 4444444444444444 4444444444444444  
> 4444444444444444

We can see the list's nodes filling up all available memory, before  
trigging a collect (4 - pages containing objects <=16 bytes in size).  
However, the collect stopped at the page at 020F1000. Let's get a closer  
look:

> M65024> binmap 020F1000
> Bin map for page 020F1000 - 020F2000:
> (object size = 0x010)            +100             +200             +300
> 020F1000: DDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDD  
> DDDDDDDDDDDDDDDD
> 020F1400: DDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDD D...............  
> ................
> 020F1800: ................ ................ ................  
> ................
> 020F1C00: ................ ................ ................  
> ................

Note that inside a page, objects are allocated in reverse order (due to  
how freelists are constructed). We can conclude that the rest of the  
linked list is not freed due to a stray reference to the node at 020F1600.  
To find the stray reference, we use the "refs" command:

> M65024> refs 020F1600 020F1610
> 00433D2C -> 020F160D - in root range 0042F000 - 00438A00

00433D2C is located within the program's static data segment. Let's see if  
it's in the map file:

> M65024> symbolat 00433D2C
> 00433D2C - ___rtl_criticalsection +30

This symbol is from snn.lib, the standard C library.

We can also dump the memory region around that address to examine it:

> M65024> dump 00433D00
> 00433D00: FF FF FF FF 00 00 00 00  00 00 00 00 00 00 00 00 | ....
> 00433D10: 00 00 00 00 01 16 02 02  03 02 04 18 05 0D 06 09 |      
> ............
> 00433D20: 07 0C 08 0C 09 0C 0A 07  0B 08 0C 16 0D 16 0F 02 |  
> ................
> 00433D30: 10 0D 11 12 12 02 21 0D  35 02 41 0D 43 02 50 11 |  
> ......!.5.A.C.P.
> 00433D40: 52 0D 53 0D 57 16 59 0B  6C 0D 6D 20 70 1C 72 09 | R.S.W.Y.l.m  
> p.r.
> 00433D50: 76 16 80 0A 81 0A 82 09  83 16 84 0D 91 29 9E 0D |  
> v............)..
> 00433D60: A1 02 A4 0B A7 0D B7 11  CE 02 D7 0B 00 00 00 00 | ............
> 00433D70: 7F 13 00 00 25 73 3A 20  00 00 00 00 25 73 0A 00 | ..  %s:      
> %s.
> 00433D80: 4E 6F 20 65 72 72 6F 72  00 4F 70 65 72 61 74 69 | No error  
> Operati
> 00433D90: 6F 6E 20 6E 6F 74 20 70  65 72 6D 69 74 74 65 64 | on not  
> permitted
> 00433DA0: 00 4E 6F 20 73 75 63 68  20 66 69 6C 65 20 6F 72 |  No such  
> file or
> 00433DB0: 20 64 69 72 65 63 74 6F  72 79 00 4E 6F 20 73 75 |  directory  
> No su
> 00433DC0: 63 68 20 70 72 6F 63 65  73 73 00 49 6E 74 65 72 | ch process  
> Inter
> 00433DD0: 72 75 70 74 65 64 20 66  75 6E 63 74 69 6F 6E 20 | rupted  
> function
> 00433DE0: 63 61 6C 6C 00 49 6E 70  75 74 2F 6F 75 74 70 75 | call  
> Input/outpu
> 00433DF0: 74 20 65 72 72 6F 72 00  4E 6F 20 73 75 63 68 20 | t error No  
> such

=== Conclusion ===

This problem isn't something that can be easily fixed. Any stray reference  
to any node in the list will prevent nodes following it from ever being  
deallocated. It can be avoided by avoiding memory structures that could be  
vulnerable to one stray pointer causing the entire structure "leaking".

For this particular case, the stray reference came from a C library's  
static data. Can the D compiler be modified to distinguish between C and D  
object files, and only add D object files' data to the list of GC roots?

-- 
Best regards,
  Vladimir                          mailto:thecybershadow at gmail.com



More information about the Digitalmars-d mailing list