[phobos] std.parallelism's unit tests randomly hang on win32

David Simcha dsimcha at gmail.com
Wed May 4 12:04:16 PDT 2011


BTW, just to clarify, I am going to keep working on it, it's just that
progress is slow because this is such a nightmarish bug and I'd like to not
hold up the release.  Therefore, I'm asking for help often to get this thing
fixed sooner rather than later.

On Wed, May 4, 2011 at 2:44 PM, David Simcha <dsimcha at gmail.com> wrote:

> Probably not.  The code includes things like waiting on condition variables
> and expecting to be resumed by other threads.
>
>
> On Wed, May 4, 2011 at 2:13 PM, Walter Bright <walter at digitalmars.com>wrote:
>
>>  I guess I'm asking if there is a way to execute all those paths in a
>> single threaded manner, in order to flush out any suspected code gen bugs.
>>
>>
>> On 5/4/2011 10:39 AM, David Simcha wrote:
>>
>> Yes, it works as a single threaded program, but there are a lot of code
>> paths that are never taken unless a worker thread finishes a job before the
>> submitter thread needs the result (which obviously can't happen in
>> single-threaded mode).  Therefore, this does not prove that the issue is a
>> concurrency bug.
>>
>> On Wed, May 4, 2011 at 1:37 PM, Walter Bright <walter at digitalmars.com>wrote:
>>
>>> Does it work as a single threaded program?
>>>
>>>
>>> On 5/4/2011 6:51 AM, David Simcha wrote:
>>>
>>>> I went a slightly different route and tried to reduce the problem to as
>>>> small a test case as possible, like I would normally do for a compiler bug.
>>>>  So far I've managed to reduce it to ~560 lines.  I've discovered this one's
>>>> more unstable (i.e. the results change a lot more in response to slight
>>>> perturbations) than I thought.  Just changing the layout of the Task struct
>>>> (deleting member variables that are no longer used anywhere) makes it go
>>>> from unit test failures to access violations. Adding or removing try/catch
>>>> blocks or empty destructors in some places can completely prevent the bug
>>>> from manifesting.  On Linux, if I perturb things slightly by changing the
>>>> layout of Task, I get exceptions thrown from core.sync.
>>>>
>>>> This looks like some kind of memory/stack corruption bug but due to its
>>>> nondeterminism (only a few thread interleavings seem to take the proper
>>>> codepath and I'm not sure which ones) and its very indirect manifestation
>>>> (memory corruption; the low order bit overwriting thing was, I think, just a
>>>> manifestation of a deeper problem), I am somewhat at a loss for how to debug
>>>> it.  I've scrutinized the concurrency related aspects and still can't find
>>>> any bugs there.  However, I can't prove it's not a concurrency bug since
>>>> running in single threaded mode prevents certain code paths from being
>>>> taken.  Unless I get some advice that changes things, I think my next move
>>>> is to compare the disassemblies for cases that work to those for cases that
>>>> don't.
>>>>
>>>>   _______________________________________________
>>> phobos mailing list
>>> phobos at puremagic.com
>>> http://lists.puremagic.com/mailman/listinfo/phobos
>>>
>>
>>
>> _______________________________________________
>> phobos mailing listphobos at puremagic.comhttp://lists.puremagic.com/mailman/listinfo/phobos
>>
>>
>> _______________________________________________
>> phobos mailing list
>> phobos at puremagic.com
>> http://lists.puremagic.com/mailman/listinfo/phobos
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20110504/78fba87f/attachment.html>


More information about the phobos mailing list