Naive node.js faster than naive LDC2?
H. S. Teoh
hsteoh at quickfur.ath.cx
Sat Aug 22 00:10:43 UTC 2020
On Fri, Aug 21, 2020 at 04:49:44PM -0700, H. S. Teoh via Digitalmars-d wrote:
[...]
> Using a class for Complex (and a non-final one at that!!) introduces
> tons of allocation overhead per iteration, plus virtual function call
> overhead. You should be using a struct instead. I betcha this one
> change will make a big difference in performance.
[...]
OK, so I copied the code and changed the class to struct, and compared
the results. Both versions are compiled with ldc2 -O3.
class version:
7 secs, 125 ms, 608 μs, and 9 hnsecs
7 secs, 155 ms, 328 μs, and 6 hnsecs
7 secs, 158 ms, 966 μs, and 4 hnsecs
struct version:
6 secs, 55 ms, 140 μs, and 4 hnsecs
6 secs, 125 ms, 974 μs, and 5 hnsecs
6 secs, 126 ms, 945 μs, and 4 hnsecs
For performance comparisons, take the best of n (because the others are
merely measuring more system noise). This represents about a 15%
performance increase in switching to struct instead of class.
I thought it might make a difference to optimize for my CPU with
-mcpu=native, so here are the numbers:
class version:
7 secs, 100 ms, 602 μs, and 6 hnsecs
7 secs, 100 ms, 437 μs, and 7 hnsecs
7 secs, 121 ms, 594 μs, and 4 hnsecs
struct version:
6 secs, 73 ms, 534 μs, and 3 hnsecs
5 secs, 662 ms, 626 μs, and 5 hnsecs
6 secs, 103 ms, 871 μs, and 2 hnsecs
Again taking the best of 3, that's about a 20% performance increase
between changing from class to struct.
//
Just for laughs, I tested with dmd -O -inline:
class version:
7 secs, 255 ms, 748 μs, and 5 hnsecs
7 secs, 249 ms, 683 μs, and 9 hnsecs
7 secs, 593 ms, 847 μs, and 8 hnsecs
struct version:
7 secs, 646 ms, 685 μs, and 5 hnsecs
7 secs, 618 ms, 642 μs, and 7 hnsecs
7 secs, 606 ms, 85 μs, and 4 hnsecs
Surprisingly, the class version does *better* than the struct version
when compiled with dmd. (Wow, is dmd codegen *that* bad that it
outweighs even class allocation overhead?? :-D) But both are worse than
even the class version with ldc2 -O3 (even without -mcpu=native).
So yeah. I wouldn't trust dmd with a 10-foot pole when it comes to
runtime performance. The struct version compiled with `ldc2 -O3
-mcpu=native` beats the struct version compiled with dmd by a 26%
margin. That's pretty sad.
T
--
An imaginary friend squared is a real enemy.
More information about the Digitalmars-d
mailing list