Windows multi-threading performance issues on multi-core systems only

Steven Schveighoffer schveiguy at yahoo.com
Tue Dec 15 09:52:47 PST 2009


On Tue, 15 Dec 2009 12:23:01 -0500, dsimcha <dsimcha at yahoo.com> wrote:

> == Quote from Dan (dsstruthers at yahoo.com)'s article
>> My code does do considerable array appending, and I see exactly the  
>> same issue
> as dsimcha points out above.  I would expect it is GC related, but why  
> for
> multiple cores only, I cannot fathom.
>> Thanks for the repro, dsimcha!  My code snippet would not have been as
> straight-forward.
>
> Two reasons:
>
> 1.  When GC info is queried, the last query is cached for (relatively)  
> efficient
> array appending.  However, this cache is not thread-local.  Therefore,  
> if you're
> appending to arrays in two different threads simultaneously, they'll  
> both keep
> evicting each other's cached GC info, causing it to have to be looked up  
> again and
> again.
>
> 2.  Every array append requires a lock acquisition.  This is much more  
> expensive
> if there's contention.
>
> Bottom line:  Array appending in multithreaded code is **horribly**  
> broken and I'm
> glad it's being brought to people's attention.  For a temporary fix,  
> pre-allocate
> or use std.array.Appender.

Yes, but why does multiple cores make the problem worse?  If it's the  
lock, then I'd expect just locking in multiple threads without any  
appending does worse on multiple cores than on a single core.  If it's the  
lookup, why does it take longer to lookup on multiple cores?  The very  
idea that multiple cores makes threading code *slower* goes against  
everything I've ever heard about multi-core and threads.

I agree array appending for multithreaded code is not as efficient as it  
is when you use a dedicated append-friendly object, but it's a compromise  
between efficiency of appending (which arguably is not that common) and  
efficiency for everything else (slicing, passing to a function, etc).  I  
expect that very soon we will have efficient appending with thread local  
caching of lookups, but the multi-core thing really puzzles me.

-Steve



More information about the Digitalmars-d mailing list