DMD now incorporates a disassembler
max haughton
maxhaton at gmail.com
Sun Jan 9 06:04:25 UTC 2022
On Sunday, 9 January 2022 at 02:58:43 UTC, Walter Bright wrote:
>
> I've never seen one. What's the switch for gcc to do the same
> thing?
>
For GCC/Clang you'd want -S (and then -masm=intel to make the
output ~~beautiful to nobody but the blind~~ readable). This
dumps the output to a file, which isn't exactly the same as what
-vasm does, but I have already begun piping the -vasm output to a
file since (say) hello world yields a thousand lines of output
which is much easier to consume in a text editor.
To do it with ldc the flag is `--output-s`. I have opened a PR to
make the ldc dmd-compatibility wrapper (`ldmd2`) mimic -vasm
Intel (and to a lesser extent Clang) actually annotate the
generated text with annotations intended to be read by the humans.
e.g.
Intel C++ (which is in the process of being replaced with Clang
relabeled as Intel C++) prints it's (hopeless unless you are
using PGO, but still) estimates of the branch probabilities.
```
test al, al #5.8
je ..B1.4 # Prob 22% #5.8
# LOE rbx rbp r12 r13 r14 r15
# Execution count [7.80e-01]
```
You can also ask the compiler to generate an optimization report
inline with the assembly code. This *is* useful when tuning since
you can tell what the compiler is or isn't getting right (e.g.
find which roads to force the loop unrolling down). The Intel
Compiler also has a reputation for having an arsenal of dirty
tricks to make your code "faster" which it will deploy on the
hope that you (say) don't notice that your floating point numbers
are now less precise.
`-qopt-report-phase=vec` yields:
```
# optimization report
# LOOP WITH UNSIGNED INDUCTION VARIABLE
# LOOP WAS VECTORIZED
# REMAINDER LOOP FOR VECTORIZATION
# MASKED VECTORIZATION
# VECTORIZATION HAS UNALIGNED MEMORY REFERENCES
# VECTORIZATION SPEEDUP COEFFECIENT 3.554688
# VECTOR TRIP COUNT IS ESTIMATED CONSTANT
# VECTOR LENGTH 16
# NORMALIZED VECTORIZATION OVERHEAD 0.687500
# MAIN VECTOR TYPE: 32-bits integer
vpcmpuq k1, zmm16, zmm18, 6 #5.5
vpcmpuq k0, zmm16, zmm17, 6 #5.5
vpaddq zmm18, zmm18, zmm19 #5.5
vpaddq zmm17, zmm17, zmm19 #5.5
kunpckbw k2, k0, k1 #5.5
vmovdqu32 zmm20{k2}{z}, ZMMWORD PTR [rcx+r8*4] #7.9
vpxord zmm21{k2}{z}, zmm20, ZMMWORD PTR [rax+r8*4] #7.9
vmovdqu32 ZMMWORD PTR [rcx+r8*4]{k2}, zmm21 #7.9
add r8, 16 #5.5
cmp r8, rdx #5.5
jb ..B1.15 # Prob 82% #5.5
```
People don't seem to care about SPEC numbers too much anymore,
but the Intel Compilers still have many features for gaming
standard test scores.
http://www.spec.org/cpu2006/results/res2007q3/cpu2006-20070821-01880.html If you looked at this, you'd think that Intel just managed a huge increase on `libquantum` which we can all use on our own code, but it turns out they worked out they can just tell the compiler to automagically parallelize the code, but still only have 1 nominal process.
https://stackoverflow.com/questions/61016358/why-can-gcc-only-do-loop-interchange-optimization-when-the-int-size-is-a-compile for more overfitting.
> Compilers that take a detour through an assembler to generate
> code are inherently slower.
Certainly, although in my experience not by much. Time spent in
the assembler in dominated by time spent in the linker, and just
about everywhere else in the compiler (especially when you turn
optimizations on). Hello World is about 4ms in the assembler on
my machine.
GCC and Clang have very different architectures in this regard
but end up being pretty similar in terms of compile times. The
linker an exception to that rule of thumb, however, in that the
LLVM linker is much faster than any current GNU offering.
>> It doesn't have a distinct IR like LLVM does but the final
>> stage of the RTL is basically a 1:1 representation of the
>> instruction set:
>
> That looks like intermediate code, not assembler.
It is the (final) intermediate code, but it's barely intermediate
at this stage i.e. these are effectively just the target
instructions printed with LISP syntax.
It's, helpfully, quite obfuscated unfortunately: Some of that is
technical baggage, some of it is due to the way that GCC was
explicitly directed to be difficult to consume).
I'm __not__ suggesting any normal programmer should use, just
showing what GCC does since I mentioned LLVM.
Anyway, I've been playing with -vasm and I think it seems pretty
good so far. There are some formatting issues which shouldn't be
hard to fix at all (this is why we asked for some basic tests of
the shape of the output), put I think I've only found one (touch
wood) situation where it actually gets the instruction *wrong* so
far.
Testing it has led to me finding some fairly bugs in the dmd
inline assembler, which I am in the process of filing.
More information about the Digitalmars-d-announce
mailing list