Program locked at joinAll and sched_yield

tcak via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Sun Jul 3 11:25:32 PDT 2016


On Sunday, 3 July 2016 at 17:19:04 UTC, Lodovico Giaretta wrote:
> On Friday, 1 July 2016 at 12:02:11 UTC, tcak wrote:
>> I have my own Http Server. Every request is handled by a 
>> thread, and threads are reused.
>>
>> I send 35,000 request (7 different terminals are sending 5000 
>> requests each) to the server again and again (each of them 
>> lives for short).
>>
>> Anyway, everything works great, there is no problem at all.
>>
>> I put "readln" in main function. So, when I press enter, all 
>> currently idle threads are stopped. (I use thread.join()).
>>
>> Problem is that, all threads are stopped, by the last thread 
>> Thread#1 gets locked at sched_yield(), it uses one of CPU 
>> cores at 100%, and program never quits and stays there.
>>
>> There is only one remaining thread at the end, and below is 
>> its stack trace.
>>
>> sched_yield() in 
>> /build/glibc-GKVZIf/glibc-2.23/posix/../sysdeps/unix/syscall-template.S:84
>>
>> thread_joinAll() in
>>
>> rt_term() in
>>
>> rt.dmain2._d_run_main(int, char**, extern(C) int(char[][]) 
>> function).runAll()() in
>>
>> rt.dmain2._d_run_main(int, char**, extern(C) int(char[][]) 
>> function).tryExec(scope void() delegate)() in
>>
>> _d_run_main() in
>>
>> main() in
>>
>> __libc_start_main(int (*)(int, char **, char **) main, int 
>> argc, char ** argv, int (*)(int, char **, char **) init, void 
>> (*)(void) fini, void (*)(void) rtld_fini, void * stack_end) in 
>> /build/glibc-GKVZIf/glibc-2.23/csu/../csu/libc-start.c:291
>>
>> _start() in
>>
>>
>> Is there any known issue about this? or anything that is known 
>> to cause this problem?
>
> Hi!
>
> Can you provide a reduced test case that shows the issue? 
> Without any code, it's difficult to tell what's going on.

Well, I actually have found out about the issue, and solved it a 
different way.

I put memory limit on the process for testing.

At some point, due to memory limitation, thread.start() method 
fails. But, this method cannot recover the system correctly, and 
Phobos thinks that thread has been started correctly.

This happens, if I understand correctly, due to the value of 
variable "nAboutToStart" in core.thread, line 685. Its value is 
increase here, and is decreased by 1 in "add" function on line 
1775. When start() fails, add() is not called for it ever, and 
thread_joinAll() on line 2271 gets into an endless loop. There 
by, the program cannot quit, and loop starts using 100% CPU.

---

What I did to solve this issue is that I created my thread by 
using pthread_create() function, and called thread_attachThis(). 
This way, problem is prevented.

---

As a solution, when thread creation is failed in start() method, 
we should decrease the value of "nAboutToStart" by 1, but it 
seems like "pAboutToStart" needs to be touched to recover the 
system properly. Fortunately there is not much code in the 
start() method.


More information about the Digitalmars-d-learn mailing list