Fixing the -march/-mcpu situation

David Nadlinger code at klickverbot.at
Wed Oct 2 10:01:28 PDT 2013


Currently, -march/-mcpu are pretty much broken in LDC. [1] The cause
for the confusion is that the internal LLVM tools (llc, …, including
LDC) interpret the options differently from GCC (and Clang):

For LLVM, -march selects the target architecture, i.e. x86, ARM, and
so on, and -mcpu selects specific CPUs (-mcpu=corei7) or features
(-mcpu=+sse42).

For GCC (which doesn't support multiple targets at the same time), on
the other hand, behavior slightly differs between the available
targets. In all cases, -march selects some sort of sub-target to use
for the compilation. For x86, these are the different instruction set
extensions/scheduling parameters (e.g. -march=corei7), where, quoting
gcc(1), »In contrast to -mtune=cpu-type, which merely tunes the
generated code for the specified cpu-type, -march=cpu-type allows GCC
to generate code that may not run at all on processors other than the
one indicated.«. As far as GCC targeting x86 is concerned, -mcpu is a
deprecated synonym for -mtune, but for other targets, it actually
changes the permissible instructions too (which is probably why it was
deprecated on x86).

So, what should we do for LDC?
 (1) Follow the convention of the LLVM tools, because it seems like
the natural thing to do and the LLVM convention is arguably saner?
 (2) Change the meaning of the parameters to match GCC, because this
is what many users will probably expect?

If we go with (1), the flag to use for best-effort compilation would
probably be "-mcpu=native", although we could probably include
"-march=native" to provide a "just works" experience for GCC users.

In my opinion, fixing that situation is the single most important
issue to attack before pushing out a release (I've somewhat given up
on that AA issue[2] by now, I'm just not seeing the forest for the
trees). The main reason for that, besides the embarrassing fact that I
announced something in the last release notes that does not actually
work, is that people are often using LDC specifically for performance,
and extended instruction sets can make quite the difference here. Our
0.11.0 release was a step backwards for many people in that regard, as
we were always targeting the build host CPU before that (now, a
generic lowest-denominator CPU is assumed, just as most other
compilers do).

David


[1] https://github.com/ldc-developers/ldc/issues/414
[2] https://github.com/ldc-developers/ldc/issues/407


More information about the digitalmars-d-ldc mailing list