Hard-to-reproduce GC bug

Sean Kelly sean at invisibleduck.org
Fri Dec 5 14:17:56 PST 2008


== Quote from dsimcha (dsimcha at yahoo.com)'s article
>
> Thanks, though I'm way ahead of you in that I already did this.  Works great,
> except it's a little bit slow.
> I'm actually working on an implementation of the SuperStack proposed by Andrei
> about a month ago, which was why I needed good TLS.  It seems like with the
> current implementation (using the faster explicit key solution instead of the
> slower class-based solution), about 1/3 of my time is being spent on retrieving
> TLS.  I got this number by caching the stuff from TLS on the stack of the calling
> function and passing it in as a parameter.  This may become a semi-hidden feature
> for wringing out that last bit of performance from SuperStack.  Is TLS inherently
> slow, or is the druntime implementation relatively quick and dirty and likely to
> improve in the future?

The druntime implementation is about as fast as user-level TLS can get, I'm
afraid.  If you look at the implementation:

class ThreadLocal
{
    T val()
    {
        Wrap* wrap = cast(Wrap*) Thread.getLocal( m_key );
        return wrap ? wrap.val : m_def;
    }
}

class Thread
{
    static void* getLocal( uint key )
    {
        return getThis().m_local[key];
    }

    static Thread getThis()
    {
        version( Posix )
            return cast(Thread) pthread_getspecific( sm_this );
    }

    void*[LOCAL_MAX] m_local;
}

The OS-level TLS call is typically implemented as an array indexing operation,
so to get a TLS value you're looking at indexing into two arrays, a cast, and
then an additional cast and conditional jump if you use ThreadLocal.  Error
checking is even omitted for performance reasons.  If I knew of a way to make
it faster then I would :-)


Sean



More information about the Digitalmars-d mailing list