Openwrt Linux Uclibc ARM GC issue

Joakim dlang at joakim.fea.st
Sun Dec 17 19:05:04 UTC 2017


On Sunday, 17 December 2017 at 17:12:41 UTC, Radu wrote:
> On Friday, 15 December 2017 at 14:24:08 UTC, David Nadlinger 
> wrote:
>> On 15 Dec 2017, at 14:06, Radu via digitalmars-d-ldc wrote:
>>> When run, I get this error spuriously:
>>>
>>> ====================================
>>> core.exception.AssertError at rt/sections_elf_shared.d(116): 
>>> Assertion failure
>>> Fatal error in EH code: _Unwind_RaiseException failed with 
>>> reason code: 9
>>> Aborted (core dumped)
>>> ====================================
>>
>> The assert is inside an invariant which checks that the TLS 
>> information has been extracted successfully. Perhaps uclibc 
>> uses a TLS implementation that is not ABI-compatible with 
>> glibc? (druntime needs to determine the TLS ranges to register 
>> them with the GC, for the main thread as well as newly spawned 
>> ones.)
>>
>> Where in the program lifecycle does the error occur? From the 
>> backtrace, it looks like during C runtime startup, in which 
>> case I am not quite seeing the connection to the GC.
>>
>> Why unwinding fails is another question, but not one I would 
>> be terribly worried about – it is possible that the error e.g. 
>> just occurs too early for the EH machinery to be properly set 
>> up yet. Other low-level parts of druntime have been converted 
>> to directly abort (e.g. using assert(0)) instead. In fact, I 
>> am about to overhaul sections_elf_shared in that respect 
>> anyway to improve error reporting when mixing shared and 
>> non-shared builds.
>>
>>  — David
>
> My various attempts on getting it to run behaved very erratic.
> So I changed the parameters for cross compile, basically I 
> removed all architecture specifics leaving only 
> `-mtriple=arm-linux-gnueabihf`, and `-mfloat-abi=hard` on C 
> side.
>
> My testing hardware is a ARM Cortex-A7, 
> http://linux-sunxi.org/A33

I believe that triple defaults to ARMv5, are you sure your 
Openwrt kernel is built for ARMv7?  Try running uname -m on the 
device to check.  For example, most low- to mid-level smartphones 
these days ship with ARMv8 chips but the kernel is only built for 
32-bit ARMv7, so they can only run 32-bit apps.

> With the compiler switches changed I could run my test program 
> and try the druntime test runner (albeit with some changes on 
> math and stdio to get it linking):
>
> ./druntime-test-runner
> 0.000s PASS release32 core.atomic
> 0.000s PASS release32 core.bitop
> 0.000s PASS release32 core.checkedint
> 0.005s PASS release32 core.demangle
> 0.000s PASS release32 core.exception
> 0.002s PASS release32 core.internal.arrayop
> 0.000s PASS release32 core.internal.convert
> 0.000s PASS release32 core.internal.hash
> 0.000s PASS release32 core.internal.string
> 0.000s PASS release32 core.math
> 0.000s PASS release32 core.memory
> 0.002s PASS release32 core.sync.barrier
> 0.015s PASS release32 core.sync.condition
> 0.000s PASS release32 core.sync.config
> 0.016s PASS release32 core.sync.mutex
> 0.016s PASS release32 core.sync.rwmutex
> 0.002s PASS release32 core.sync.semaphore
> Segmentation fault (core dumped)
>
> The seg fault is from core.thread:1351
>
> unittest
> {
>     auto t1 = new Thread({
>         foreach (_; 0 .. 20)
>             Thread.getAll;
>     }).start;
>     auto t2 = new Thread({
>         foreach (_; 0 .. 20)
>             GC.collect; // this seg faults
>     }).start;
>     t1.join();
>     t2.join();
> }
>
> Calling GC.collect from the main thread doesn't seg fault.

Try running core.thread alone and see if it makes a difference, 
./druntime-test-runner core.thread, as I've sometimes seen tested 
modules interfere with each other.  I see that there are a few 
places where Glibc is assumed in core.thread, make sure those are 
right on Uclibc too:

https://github.com/ldc-developers/druntime/blob/ldc-v1.6.0/src/core/thread.d#L3301
https://github.com/ldc-developers/druntime/blob/ldc-v1.6.0/src/core/thread.d#L3410

You can also try skipping those tests that segfault for now and 
make sure everything else works, by adding something like 
version(skip) before that failing unittest block, so you know the 
extent of the test problems.

> Core dump is not very helpful - stack is garbage, but running 
> with gdbserver a minimal program with the unit test I can see 
> this:
>
> Thread 1 "test" received signal SIGUSR1, User defined signal 1.
> pthread_getattr_np (thread_id=0, attr=0xb6b302bc) at 
> libpthread/nptl/pthread_getattr_np.c:47
> 47        iattr->schedpolicy = thread->schedpolicy;
> (gdb) step
>
> Thread 1 "test" received signal SIGUSR2, User defined signal 2.
> 0xb6e50d80 in epoll_wait (epfd=-1090521272, events=0x8, 
> maxevents=2, timeout=-1224756080) at 
> libc/sysdeps/linux/common/epoll.c:58
> 58      CANCELLABLE_SYSCALL(int, epoll_wait, (int epfd, struct 
> epoll_event *events, int maxevents, int timeout),
> (gdb) step
>
> Thread 1 "test" received signal SIGSEGV, Segmentation fault.
> 0xfffffffc in ?? ()
> (gdb)

The SIGUSR1/SIGUSR2 signals mean the GC ran fine.  You'd need to 
delve more into the code and the implementation details mentioned 
above to track this down.

On Sunday, 17 December 2017 at 17:20:32 UTC, Radu wrote:
> Yes - latest LDC versions make cross compiling a breeze so 
> kudos to you guys for making this happening. I'm using Linux 
> subsystem for Window btw. so for me this is even more fun as I 
> can work on both environments natively :)

Yeah, you could just use the Windows ldc too, assuming you have a 
cross-compiler from that OS, as shown on the wiki for Windows 
with the Android NDK.

> The modifications need it surface deep are very few - some math 
> and memory streams functions are missing.

I don't know how much it differs from Glibc, but we'd always be 
interested in a port, assuming you have the time to submit a pull 
like this recent one for Musl:

https://github.com/dlang/druntime/pull/1997

> The road block looks to be somewhere in the GC and TLS, or the 
> interaction of them (at least this is my feeling ATM)

Not being able to do an explicit collect there isn't that big a 
deal: I'd skip that test for now and run everything else, then 
come back to that one once you have an idea of the bigger picture.


More information about the digitalmars-d-ldc mailing list