Garbage collector collects live objects
Steven Schveighoffer via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Fri Dec 12 07:50:26 PST 2014
On 12/12/14 7:52 AM, Ruslan Mullakhmetov wrote:
> On Thursday, 11 December 2014 at 18:36:59 UTC, Steven Schveighoffer wrote:
>> My analysis so far:
>>
>> 2. In the array append code, the block attributes are obtained via
>> GC.query, which has this code for getting the attributes:
>>
>> https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gc.d#L1792
>>
>>
>> Quoting from that function:
>>
>> // reset the offset to the base pointer, otherwise the bits
>> // are the bits for the pointer, which may be garbage
>> offset = cast(size_t)(info.base - pool.baseAddr);
>> info.attr = getBits(pool, cast(size_t)(offset >> pool.shiftBy));
>>
>> Which should get the correct bits. I suspected there was an issue with
>> getting the wrong bits, but this code looks correct.
>>
>> 3. The runtime caches the block info for thread local data for append
>> speed. A potential issue is that the attributes are cached from a
>> previous use for that block, but the GC (and the runtime itself)
>> SHOULD clear that cache entry when that block is freed, avoiding this
>> issue. A potential way to check this is to assert in a debug build of
>> druntime that the cached block info always equals the actual block
>> info. Are you able to build a debug version of druntime to test this?
>> I can give you the changes you should make. This would explain the
>> great difficulty in reproducing the issue.
>
> I will try to build debug version of dmd compiler and check the issue.
A debug version of compiler is not necessary, not even a debug version
of phobos, just druntime. But it's not going to matter yet, because I
need to give you the asserts to put in there. I just wanted to know if
you needed help doing it.
>
>>
>> 4. If your code is multi-threaded, but using __gshared, it can make
>> the cache incorrect. Are you doing this?
>>
>
> the app is multi-threaded via std.concurrency.
This should be OK, you should not be able to share data that is not
marked as shared.
> there is only one known to me place where __gshared is used: logging
> library (checked by searching through whole source tree). make stub for
> this lib and try, so identify whether cache invalidated by _gshared or not.
Here is where it might occur:
1. Due to shared data having typeinfo attached to it that it is actually
shared, the runtime takes advantage of that. We can use a lock-free
cache that is thread-local for anything not marked as shared, because
nothing outside the thread can access that data.
2. __gshared gets around this because it is not marked as shared by the
compiler. This means, if you, for instance, appended to a __gshared
array, the runtime would treat it like a thread-local array. If you did
this from multiple threads, the cache may be invalid in one or more of them.
3. Actual 'shared' arrays are not permitted to use the cache, so they
should not have this issue.
I see that you removed the only instance of __gshared and it did not
help. That at least rules that out.
>> But the cache is really the only possible place I can see where the
>> bits are set incorrectly, given that you just verified the bits are
>> correct before the append.
>>
>> Can you just list the version of the compiler you are using? I want to
>> make sure this isn't an issue that has already been fixed.
>
> the last. first of all i updated whole toolchain (dmd, dub).
>
> $ dmd
> DMD64 D Compiler v2.066.1
Thanks, this at least gives me a baseline to know what to test and debug
with. I do not believe the code has had any significant fixes that would
help with this issue since then.
>
> I started looking druntime and dmd source code myself before i checked
> the thread (thsnks for your help and feedback) and i have some
> questions. could you explain to me something?
>
> i_m looking here
> https://github.com/D-Programming-Language/druntime/blob/v2.066.1/src/rt/lifetime.d#L591
>
>
> -------
> line #603
> auto size = ti.next.tsize;
>
> why `next`? it can be even null if this is last TypeInfo in the linked
> list.
This is the way the compiler constructs the type info. The first
TypeInfo is always TypeInfo_Array (or TypeInfo_Shared, or const or
whatever), and the .next is the typeinfo for the element type. all this
does is get the size of an element. Since we know we are dealing with an
array, we know next is always valid.
> btw, i used suggested trackallocs.d and GC defenetely receives NO_SCAN
>
> before tag: 1 len: 2 ptr: 103A78058 root: 103A77000:8192 attr: APPENDABLE
> gc_qalloc(41, NO_SCAN APPENDABLE ) cc: 29106 asz: 10152603, ti: null
> ret: BlkInfo_(104423800, 64, 10)
> after tag: 1 len: 3 ptr: 104423810 root: 104423800:64 attr: NO_SCAN
> APPENDABLE
This is good information, thanks. I will get back to you with a druntime
branch to try. Can I email you at this address? If not, email me at the
address from my post to let me know your contact, no reason to work
through building issues on the public forum :)
-Steve
More information about the Digitalmars-d-learn
mailing list