Garbage collector collects live objects

Steven Schveighoffer via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Fri Dec 12 07:50:26 PST 2014


On 12/12/14 7:52 AM, Ruslan Mullakhmetov wrote:
> On Thursday, 11 December 2014 at 18:36:59 UTC, Steven Schveighoffer wrote:
>> My analysis so far:
>>
>> 2. In the array append code, the block attributes are obtained via
>> GC.query, which has this code for getting the attributes:
>>
>> https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gc.d#L1792
>>
>>
>> Quoting from that function:
>>
>> // reset the offset to the base pointer, otherwise the bits
>> // are the bits for the pointer, which may be garbage
>> offset = cast(size_t)(info.base - pool.baseAddr);
>> info.attr = getBits(pool, cast(size_t)(offset >> pool.shiftBy));
>>
>> Which should get the correct bits. I suspected there was an issue with
>> getting the wrong bits, but this code looks correct.
>>
>> 3. The runtime caches the block info for thread local data for append
>> speed. A potential issue is that the attributes are cached from a
>> previous use for that block, but the GC (and the runtime itself)
>> SHOULD clear that cache entry when that block is freed, avoiding this
>> issue. A potential way to check this is to assert in a debug build of
>> druntime that the cached block info always equals the actual block
>> info. Are you able to build a debug version of druntime to test this?
>> I can give you the changes you should make. This would explain the
>> great difficulty in reproducing the issue.
>
> I will try to build debug version of dmd compiler and check the issue.

A debug version of compiler is not necessary, not even a debug version 
of phobos, just druntime. But it's not going to matter yet, because I 
need to give you the asserts to put in there. I just wanted to know if 
you needed help doing it.

>
>>
>> 4. If your code is multi-threaded, but using __gshared, it can make
>> the cache incorrect. Are you doing this?
>>
>
> the app is multi-threaded via std.concurrency.

This should be OK, you should not be able to share data that is not 
marked as shared.

> there is only one known to me place where __gshared is used: logging
> library (checked by searching through whole source tree). make stub for
> this lib and try, so identify whether cache invalidated by _gshared or not.

Here is where it might occur:

1. Due to shared data having typeinfo attached to it that it is actually 
shared, the runtime takes advantage of that. We can use a lock-free 
cache that is thread-local for anything not marked as shared, because 
nothing outside the thread can access that data.
2. __gshared gets around this because it is not marked as shared by the 
compiler. This means, if you, for instance, appended to a __gshared 
array, the runtime would treat it like a thread-local array. If you did 
this from multiple threads, the cache may be invalid in one or more of them.
3. Actual 'shared' arrays are not permitted to use the cache, so they 
should not have this issue.

I see that you removed the only instance of __gshared and it did not 
help. That at least rules that out.

>> But the cache is really the only possible place I can see where the
>> bits are set incorrectly, given that you just verified the bits are
>> correct before the append.
>>
>> Can you just list the version of the compiler you are using? I want to
>> make sure this isn't an issue that has already been fixed.
>
> the last. first of all i updated whole toolchain (dmd, dub).
>
> $ dmd
> DMD64 D Compiler v2.066.1

Thanks, this at least gives me a baseline to know what to test and debug 
with. I do not believe the code has had any significant fixes that would 
help with this issue since then.

>
> I started looking druntime and dmd source code myself before i checked
> the thread (thsnks for your help and feedback) and i have some
> questions. could you explain to me something?
>
> i_m looking here
> https://github.com/D-Programming-Language/druntime/blob/v2.066.1/src/rt/lifetime.d#L591
>
>
> -------
> line #603
> auto size = ti.next.tsize;
>
> why `next`? it can be even null if this is last TypeInfo in the linked
> list.

This is the way the compiler constructs the type info. The first 
TypeInfo is always TypeInfo_Array (or TypeInfo_Shared, or const or 
whatever), and the .next is the typeinfo for the element type. all this 
does is get the size of an element. Since we know we are dealing with an 
array, we know next is always valid.


> btw, i used suggested trackallocs.d and GC defenetely receives NO_SCAN
>
> before tag: 1 len: 2 ptr: 103A78058 root: 103A77000:8192 attr: APPENDABLE
> gc_qalloc(41, NO_SCAN APPENDABLE ) cc: 29106 asz: 10152603, ti: null
> ret: BlkInfo_(104423800, 64, 10)
> after tag: 1 len: 3 ptr: 104423810 root: 104423800:64 attr: NO_SCAN
> APPENDABLE

This is good information, thanks. I will get back to you with a druntime 
branch to try. Can I email you at this address? If not, email me at the 
address from my post to let me know your contact, no reason to work 
through building issues on the public forum :)

-Steve


More information about the Digitalmars-d-learn mailing list