[Dlang-internal] GC experts: Performance when using many small ranges?
Johannes Pfau via Dlang-internal
dlang-internal at puremagic.com
Thu Jul 27 23:23:07 PDT 2017
I've already asked on the main newsgroup, but seems this didn't
catch the attention of our GC experts:
http://forum.dlang.org/thread/oka1vo$4sr$1@digitalmars.com
Basically I want to get emulated TLS working in GDC and wonder
whether we could somehow integrate with the GCC emutls code. We'd
need to post some patches for the libgcc emutls code so I'm
interested in the best way to implement the GC scanning,
particularly regarding performance.
The main problem is that GCC emutls allocates every single TLS
variable in every thread using a malloc call. So we have lots of
independent memory ranges. How does the GC perform in such
situations, assuming I add an interface to libgcc to iterate all
allocated memory ranges and use the scanDG delegate in
rt.sections / rt.tlsgc?
An alternative could be to somehow implement support for custom
allocators in GCC emutls and allocate all out D TLS variables
using the GC. We'd still have to scan the per-thread TLS pointer
array to avoid pinning all GC allocations, but this should work.
Main drawback is a large bloat in the data segment to store a
pointer to the allocation function for every variable.
(FYI, more details about the GCC emutls implementation are given
in the linked forum thread)
So what do you think is best for GC performance? Option 1 would
be a rather simple extension in libgcc, option 2 is more
intrusive.
-- Johannes
More information about the Dlang-internal
mailing list