Shared libraries, symbol visibilities, Posix vs. Windows
kinke
noone at nowhere.com
Thu May 16 13:00:44 UTC 2024
This post is intended to shed some light on shared-library
details, common pitfalls, and compare Posix and Windows. It's
LDC-centric. I'm trying not to go into too many details, but
that's hard. :)
## Posix
I'm focusing on ELF here (Linux, BSDs, …), but Apple's Mach-O
seems to work analogously (except for no shared druntime/Phobos
support for macOS with DMD yet).
For symbols to be accessible from other binaries, they need to be
'dynamic symbols', which can e.g. be inspected via `objdump -T
<binary>` (or `readelf --dyn-syms <binary>`). A symbol becomes a
dynamic symbol if both of these requirements are met:
* object file: default (ELF) symbol visibility (`STV_DEFAULT`),
not hidden (`STV_HIDDEN`)
* binary: exporting these symbols at link-time via
`--export-dynamic` (all default-visibility symbols) or
selectively via `--export-dynamic-symbol[-list]` (linker support
varies)
* `--export-dynamic` is the default setting when linking a
shared library
* but not for executables (DMD adds it implicitly to the linker
command, LDC doesn't)
With LDC, the object-file symbol visibility is controlled by
`-fvisibility={public,hidden}` (analogous to gcc/clang -
controlling the default visibility for all symbol definitions),
as well as explicit `export` D visibility and the
`@ldc.attributes.hidden` UDA.
The compiler doesn't need to know whether an external symbol is
going to be provided by some object file/static library or a
shared library at link-time (no 'import' complications at all;
`-dllimport` is ignored on Posix). On Posix, static and shared
libs are mostly interchangeable, everything 'just works'.
### Unifying duplicate dynamic symbols across the whole process
One important aspect is that the dynamic loader 'unifies' dynamic
symbols if multiple binaries define it (probably using the first
encountered symbol). So dynamic-symbol addresses are identical in
these binaries, and the binaries operate on the same shared state
(data symbols).
Say we have a D executable statically linked against the
`concurrency` dub library, and a shared D library that contains
its own `concurrency` library (linked statically into the shared
library). If both binaries export their `concurrency` symbols as
dynamic symbols, there's effectively a single shared
`concurrency` state for the whole process. So you don't *need* to
link executable and shared library against a shared `concurrency`
library to e.g. have a single `globalStopSource` instance for the
whole process.
If there are multiple versions of the same library in the whole
process (duplicate static libs), a potentially surprising pitfall
is that module constructors, CRT constructors etc. are still
invoked once per containing binary, so multiple times (and
operating on the same data). This can be even more surprising if
the static libs are compiled differently, e.g., via extra
`version`s for the static `concurrency` lib linked into the
shared library, but loading the shared library then invoking the
module constructors from the executable (if the `ModuleInfo` data
symbol is a dynamic symbol in both binaries, or the module
constructor function itself). [We've had such a case at Symmetry,
so I'm not pulling this out of thin air.]
### Common practices
AFAIK, one usually doesn't bother with selective exports via
`-fvisibility=hidden`, just compiling with default
`-fvisibility=public` and thus exporting ~everything. `@hidden`
is handy for symbols that need to be DSO-local (to be resolved
inside the same binary only, not 'imported' or
unified/preempted), but that's an exceptional use case (LDC's
druntime has a few of these).
For D in particular, the stack traces in druntime depend on
dynamic symbols - the function names are only resolved if the
function is a dynamic symbol [while file+line infos are derived
from the DWARF debuginfos]. So using `-L--export-dynamic` for
linking executables isn't uncommon (default for DMD) to resolve
function names from the executable too. The downside is that it
prevents the linker from stripping unused symbols - dynamic
symbols aren't stripped, and accordingly neither are any
non-dynamic symbols that they reference.
Another D-specific aspect is that if a process consists of
multiple D binaries, they must share a single shared druntime
[compiled with `-version=Shared` for some important diffs between
static and shared druntime variants]. So if e.g. a D executable
comes with plugins support (loading shared D libraries at
runtime), the executable needs to be linked with
`-link-defaultlib-shared` explicitly (`-link-defaultlib-shared`
is the default when linking a shared library via `-shared`), to
link against the shared druntime and Phobos libraries [separate
for LDC, not a single merged `libphobos2.so` as for DMD].
## Windows
On Windows, we are back in the stone age. Some
limitations/differences:
* Binaries cannot export more than 64K symbols.
* When linking a DLL implicitly (i.e., not loading it manually at
runtime and looking up the symbol address via
`GetProcAddress()`), you don't link against the .so/dylib
directly as on Posix, but have to use a separate 'import library'
generated by the linker (`mylib.dll` with import library
`mylib.lib`).
* You can't link a DLL and have some symbols resolved at
load-time (to be provided by the loading process). All symbols
need to be resolved at link-time.
* The loader doesn't take care of resolving references to symbols
exported from other binaries; the compiler needs to do it
manually at runtime. Accordingly, no automatic 'unifying' of
duplicate exported symbols.
With that ridiculous 64K-symbols limit, it's clear that we cannot
default to `-fvisibility=public` on Windows, otherwise you
wouldn't be able to link any binary with more than 64K symbol
definitions. [At Symmetry, we have a fat shared library, which on
Linux has more than 600K dynamic symbols; on Windows, we
explicitly export a handful of symbols only.] So one needs to
either resort to selective `export`s (e.g., for plugins with a
small number of exported functions only), or use a higher number
of smaller shared libraries explicitly compiled with
`-fvisibility=public` (such as the druntime and Phobos DLLs).
### Exports
There's no concept of object-file visibilities in COFF. Instead,
what happens is that the compiler embeds linker directives in the
object file if a symbol defined in that object file is to be
exported (`/EXPORT:foo`). AFAIK, you can't override or tweak this
at link-time later (as possible on Posix via
`--export-dynamic…`), so this is all controlled at compile-time
already. If there are exported symbols/linker directives, the
linker automatically generates an import library for the linked
executable/DLL.
### Imports
While on Posix there's no explicit importing, on Windows things
are totally different - if you want to directly access a symbol
defined in another binary, you need to use the import-symbol
indirection (symbol `foo` needs to be resolved as `*__imp_foo` -
at runtime, as `__imp_foo` is set by the system at startup).
The `export` visibility on Windows serves two purposes:
- For the object file defining an `export`ed symbol, it causes
the symbol to be dllexported from every binary that object file
is linked into.
- In other object files referencing that symbol, the symbol is
dllimported, unless the object file has been compiled together
(in the same compiler invocation) with the object file that
exports it. The assumption here is that all of the object files
produced in a single compiler invocation are linked together, not
ending up in different binaries. E.g., if you compile a static
library in a single compiler invocation, and export a symbol
explicitly, then all produced object files that don't define the
symbol reference it directly without dllimport (so to be resolved
inside the same binary at link-time). So you don't *have* to use
a .di header to replace an `export` definition with a declaration
- if the module defining the symbol isn't part of the current
compilation (not a root module, only D-imported), it's
dllimported automatically.
#### Functions
For functions, the import libraries fortunately contain
trampolines (with the original function names). When calling some
`foo` function exported by another binary, you can link that
binary's associated import library, which provides a `foo`
trampoline, which (presumably) loads `__imp_foo` and jumps to
that address. So calling/accessing some function in another
binary doesn't *require* any extra handling from the compiler.
Note that the function addresses will diverge across binaries (as
`&foo` might be a trampoline specific to the current binary),
unlike on Posix. [For LDC, I've had to adapt a single druntime
unittest, where the function identity/address mattered.] And
well, you're going through a trampoline instead of calling the
function directly, so this might come with a tiny performance
penalty.
#### Data
Data symbols on the other hand are a problem - trampolines aren't
an option because the indirection needs to be loaded at runtime
(so we need to *run* code for that, can't just access some
`__imp_foo` directly). In essence, the compiler needs to know in
advance if a data symbol will be imported from some other binary,
and then replace `foo` by `*__imp_foo`. That's pretty simple in
function bodies.
[References to such dllimported data symbols in static data
initializers on the other hand are a pain. E.g., if an object
file defines a TypeInfo for some struct defined in another DLL,
and that `TypeInfo.initializer.ptr` needing to be set to the
dllimported init symbol. LDC keeps track of such references per
object-file and emits a CRT constructor which performs the
required 'relocations' manually, at runtime.]
Note that there's no support for exporting/importing **TLS**
symbols at all (in C++ neither). Again, something that just works
on Posix. [IIRC, I've only had to adapt a single TLS variable in
druntime for now though, using a function returning a ref
instead.]
Compared to C++, the situation is trickier for D, as we have a
bunch of implicit data symbols, like ModuleInfos, init symbols
and way more commonly used (and complicated!) TypeInfos.
### Keeping things reasonably simple with
`-dllimport={none,all,defaultLibsOnly}`
The main problem on Windows is that the compiler needs to know in
advance if a data symbol will be imported from some other binary.
While you could provide the compiler with a fine-grained list of
modules/packages that are to be treated as external (ending up in
another binary), I've decided to go with a simpler scheme for
LDC, focusing on 2 use cases:
1. Building every library as its own shared library. For a dub
project, this would be building every direct and indirect
dependency as its own separate shared library (not really
feasible with dub today). Similar to a Linux distro package
manager with a central set of shared libraries.
- This is what LDC defaults to with `-shared`, for symmetry
with Posix.
- Similar to how it just works on Posix: export everything
(`-fvisibility=public`), and import all (`extern(D)`) data
symbols that aren't defined in a compiled root module
(`-dllimport=all`). No need for a carefully manually crafted
`export` library interface. This works best if compiling each
library with a single compiler invocation (all modules contained
in the shared library), but isn't a requirement [then potentially
dllimporting data symbols exported in separately compiled object
files, with a linker warning 'importing locally defined symbol' -
probably a slight performance penalty].
- And also similar to Posix, there's a single state per
library, because each library is present only once in the whole
process (no duplicate static libraries with their own separate
states).
- With many smaller DLLs, the 64K symbols-limit should be
manageable.
2. A process consisting of few larger shared libraries, each with
few selective/explicit `export`s only (`-fvisibility=hidden`),
but automatically importing all data symbols from druntime and
Phobos (`-dllimport=defaultLibsOnly` - basically treating a
module as binary-external if starting with `std.`, `core.` or
`ldc.`).
- When linking a static library into such a binary, it must
have been compiled with matching visibility options
(`-fvisibility=hidden -dllimport=defaultLibsOnly`). Somewhat
similar to how you have to compile C(++) code ending up in a
shared Posix library with `-fPIC`.
This makes it possible to use shared libraries on Windows quite
painlessly, all controlled by the `-fvisibility` and `-dllimport`
compile options, and optionally the D `export` visibility +
`@hidden` UDA.
What isn't supported is, for example, a dub project where some
deps are built as shared library (without selective/explicit
`export`s), and others as static libraries. Say, only using the
`concurrency` dub dependency as a shared library exporting
everything (to have a single process-global state for that
library on Windows too), and linking everything else statically.
That would require more fine-grained control over binary-external
modules, with an according combinatorial explosion (something
like `-dllimport=std.*,core.*,ldc.*,concurrency.*`).
## Templates
Similar to gcc/clang's `-fvisibility-inlines-hidden`, you can use
LDC's `-linkonce-templates` to NOT export any instantiated
symbols, so that each binary comes with its own instantiated
state and functions.
On Windows, without `-linkonce-templates`, there's again the
problem of importing instantiated data symbols. Such a symbol can
be instantiated and defined (possibly exported) in multiple
binaries, plus there's template-codegen-culling mechanism in the
frontend. For somewhat predictable behavior, I've chosen to do a
sort of 'lightweight' `-linkonce-templates` for instantiated data
symbols, if the template *declaration* is in a binary-external
module. This means that there's one such instantiated data symbol
for each Windows binary that references it. A simplified example:
if Phobos declares a template with some counter global, and
multiple binaries compiled with `-dllimport=defaultLibsOnly`
instantiate it identically, they'll all have their own counter
globals. Again, on Posix, the loader unifies the instantiated
data symbol, everything just works. [More infos:
https://github.com/ldc-developers/ldc/issues/3931]
## Example: SIL
For a project at Symmetry, we currently have the following
architecture, working on both Linux (DMD and LDC) and Windows
(LDC only):
* a bunch of thin frontends (executables and shared libraries),
* the core as a single fat shared library, with a handful of
explicit `export`ed functions (and something akin to a `.di`
header as shared-lib interface), implicitly linked against all
frontends, and
* a bunch of plugins (shared libraries) which can be loaded
dynamically at runtime, each with a dozen (or so) explicitly
`export`ed functions (resolved via `GetProcAddress`/`dlsym`)
On Windows, *everything* (except for prebuilt druntime and Phobos
DLLs) is compiled with `-fvisibility=hidden
-dllimport=defaultLibsOnly`. All binaries share some base dub
dependencies that are all linked statically.
This is an evolution from a prior approach, where we had a
smaller core with about 25 plugins, and linked that core
statically into every frontend. The static libraries duplication
(base dub dependencies) was much worse then, causing a much
higher overall bundle size. So we extracted the core as separate
shared library and now link most former plugins statically into
that core.
Handling non-unified separate states on Windows can be a pain:
https://github.com/symmetryinvestments/concurrency/pull/88
The full bundle consists of about 200 dub libraries/executables,
so the alternative of building every dub dependency as its own
shared library with (on Windows) `-fvisibility=public
-dllimport=all` doesn't seem too attractive and hasn't been
tested yet; it would surely be a huge challenge. :)
## My 2 cents
As is hopefully clear by now, it's archaic Windows which
complicates matters enormously wrt. shared libraries. My strong
opinion on this is that the D language itself shouldn't cater to
its limitations - we try to do our best (with reasonable effort)
to make things work on Windows too (Rainer Schütze has been
working on adopting the LDC scheme to DMD, some things landed
already), but the OS is just too primitive to handle all cases
without too much Windows-only effort (like adding our own
D-specific extra indirection for all symbols to implement a
unified state, or wrapping TLS variables with functions - all
stuff the compiler could do, but just for a crappy operating
system?).
More information about the Digitalmars-d
mailing list