How to make D resolve C++ symbols by mangling symbols with the Itanium ABI on Windows

Carl Sturtivant sturtivant at gmail.com
Thu Feb 29 19:54:24 UTC 2024


On Monday, 26 February 2024 at 13:36:42 UTC, thumbgun wrote:
> I'm currently trying to call some C++ functions that were 
> compiled by g++ (mingw). However g++ uses the Itanium ABI name 
> mangling rules. dmd on Windows tries to link functions based on 
> the MSVC name mangling rules.
> [...]
> Is there any way I can make dmd link to symbols mangled 
> according to the Itanium ABI's rules on Windows?

Here's a simple way to do this with *no change to source code* in 
either the dynamic library or the DMD project that is supposed to 
dynamically link to it. Of course this doesn't not resolve 
potential C++ calling convention issues (that don't exist for C), 
but now anyone is in a position to investigate when they exist.

I made a tiny proof of concept and it works. For concreteness, 
suppose the dynamic library is `libx.dll`, built with the 
(mingw64) `gcc` installed with the latest 
[MSYS2](https://www.msys2.org/) as that led to all the utilities 
I needed and a bash command line.

Suppose also the DMD project executable when build will be 
`main.exe` compiled from `main.d` and `other.d` and a D interface 
file `header.di` containing the necessary declarations for using 
`libx.dll`. I'll state the obvious below to make the explanation 
complete and for snag free experimentation.

Suppose for a moment there's no mangling problem because 
`libx.dll` is compiled from C source, not C++. I'll describe the 
exact context and then show how to fix it up for C++ with the 
mangling problem solved.

To dynamically link to `libx.dll` from a DMD executable 
`main.exe` DMD needs to link an *implib* ([import 
library](https://en.wikipedia.org/wiki/Dynamic-link_library#Import_libraries)) during the build of `main.exe` and such can be made from a *def file* (module definition file) — which is a text file — using a library manager that knows about the MSVC world that DMD inhabits. Let `libx.def` be a def file for `libx.dll`, and let `libx.lib` be an implib for `libx.dll`.

The def file would usually be created by `gcc` given 
`-Wl,--output-def=libx.def` when `libx.dll` is linked. And an 
implib can be created from it using `dlltool` which is 
distributed with that mingw64 `gcc`.[¹](#one)
```
$ dlltool -D libx.dll -d libx.def -l libx.lib -m i386:x86-64
```
Alternatively, the MS librarian `lib` can be used.[²](#two) 
[³](#three)
```
$ lib -nologo -machine:x64 -def:libx.def -out:libx.lib
```
Now when main.exe is built, it just needs to link to that import 
library and we're in business.
```
$ dmd main.d other.d header.di libx.lib
$ ./main #works
```
Now suppose we move to C++. If we make an import library as 
above, then a build of `main.exe` will not link, because the 
`gcc`-mangled names in the implib `libx.lib` do not match the 
MSVC-mangled names supplied by DMD.

*We can fix this by modifying the def file and producing an 
implib containing the MSVC-mangled names in place of the 
corresponding `gcc`-mangled names!*

An implib contains each name to link to paired with the 
corresponding location of the function in the dynamic library 
that name refers to. Concretely `libx.lib` contains each 
`gcc`-mangled name paired with the location in `libx.dll` of the 
corresponding function. So the problem is solved if the 
`gcc`-mangled names are replaced by the corresponding 
MSVC-mangled names in the implib `libx.lib`.

There are many ways to do this! However, there's a 
[mechanism](https://learn.microsoft.com/en-us/cpp/build/reference/exports?view=msvc-170#remarks) in a def file to do just that.

Here's the def file `libx.def` for my toy `libx.dll` generated by 
`g++ -shared libx.o -o libx.dll -Wl,--output-def=libx.def`.
```
EXPORTS
     _Z11complicatedi @1
```
Here `_Z11complicatedi @1` is the `gcc`-mangle of `int 
complicated(int)`. Unfortunately, `other.d` expects this function 
to be mangled as `?complicated@@YAHH at Z`, as this is the 
MSVC-mangle of `int __cdecl complicated(int)`[⁴](#four) and comes 
from `extern(C++) int complicated(int);` in `header.di`.

Editing `libx.def` into
```
EXPORTS
     ?complicated@@YAHH at Z=_Z11complicatedi @1
```
substitutes the MSVC-mangled name on the left for the 
`gcc`-mangled name on the right when generating the implib 
`libx.lib`. Using the MS librarian as before and building 
`main.exe` removes the linking error and the result just works.

*However while using `dlltool` or `llvm-dlltool` as before 
produces implibs that satify the linker, the resulting `main.exe` 
when run did nothing in my toy example, simply returning to the 
prompt with no output as of 2024-02-29.*

A `libx.def` and hence `libx.lib` for any `main.exe` and 
`libx.dll` with many substitution lines placed in the def file 
could be mechanically generated for once and for all. Or 
`libx.def` and `libx.lib` could be rebuilt on the fly as new 
symbols are used while the DMD project is being written.

Using the MS `dumpbin` tool produces text from which MSVC-mangled 
symbols can be extracted, along with their demanglings. So if the 
DMD project is compiled to a lib using the -lib option so that it 
builds when linkage would be broken then a table of 
(unmangled,MSVC-mangled) name pairs for linkage can automatically 
constructed from running `dumpbin` on the resulting `main.lib` 
and tearing up the resulting text. Similarly, the utility `nm` 
can be used to produce a table of (unmangled,`gcc`-mangled) pairs 
from `libx.dll` and that combined with the text of `libx.def` to 
produce the modified `libx.def` with the necessary additional 
qualifiers as in the example above.

A script could do this and then `lib` run to build the import 
library on the fly during a build. Or, if the library's bindings 
are all in a D header file already, say `header.di` then that 
could be used to produce the pairs containing unmangled and the 
MSVC-mangled names once and for all, and the corresponding 
`libx.def` file then used to produce the implib `libx.lib` that 
could be endlessly used with `libx.dll`.

Lots of possibilities here!

There is a library distributed with mingw64 `gcc` to [demangle 
MSVC-mangled 
names](https://mingw-w64.sourceforge.net/libmangle/index.html), 
though I did not use it. So in principle the substitutive def 
file could be made using just nm to dump the MSVC-mangled 
binaries, so no MS tools are needed to make it.

Of course what we really need to know is the extent to which 
cross calling actually works for various C++ constructs. I'd be 
grateful if anyone finds this out that they'd post it here. I'm 
not a C++ fan, so I'm not the person to do this.

___
[[1]](#1) This worked with my toy example, but there are claims 
online that dlltool is unreliable, in which case 
[llvm-dlltool](https://github.com/ldc-developers/llvm-project/releases/download/ldc-v14.0.0/llvm-14.0.0-windows-x64.7z) might be better. They both have the same command line, and I could distinguish no difference between them in my toy examples.

[[2]](#2) A bash script to put the directory `lib.exe` is 
resident in at the front of your MSYS2 path before executing it 
is handy, so as to avoid polluting that deliberately isolated 
path with MS related executables. This technique can be used for 
other MS tools mentioned above. So e.g. in `~/bin/lib` made 
executable could be the following with `VCBIN` appropriately set 
in `~/.bash_profile` as an MSYS2 path obtained from the windows 
path to `lib.exe`'s directory using the `cygpath` utility that 
comes with MSYS2. Note that it says `lib.exe` in the script, not 
`lib` to avoid accidental recursion.
```
#!/bin/bash
PATH="$VCBIN:$PATH"
lib.exe "$@"
```

[[3]](#3) Avoid using DOS style options like e.g. `/nologo` in 
favor of unix style options like `-nologo` because MSYS2 tries to 
helpfully modify command lines and regards `/nologo` as a an 
MSYS2 path which will be converted to a Windows path before 
executing the command.

[[4]](#4) It seems this is because `__cdecl` is the default 
calling convention for (mingw64) `gcc` and DMD's `extern(C++)` 
assumes this, and MSVC-mangling always includes the calling 
convention in the signature being mangled, even though 
`gcc`-mangling does not if it is the default of `__cdecl`.


More information about the Digitalmars-d mailing list