I wonder how fast we'd do
KnightMare
black80 at bk.ru
Tue May 28 10:23:40 UTC 2019
On Tuesday, 28 May 2019 at 04:38:32 UTC, Andrei Alexandrescu
wrote:
> https://jackmott.github.io/programming/2016/07/22/making-obvious-fast.html
totally for:
https://pastebin.com/j0T0MRmA small changes to Marco de Wild code
Windows Server 2019, i7-3615QM, DMD 2.086.0, LDC 1.16.0-b1
C:\content\downloadz\dlang>ldc2 -release -O3 -mattr=avx times2.d
C:\content\downloadz\dlang>times2.exe
t1=42 ms, 714 ╬╝s, and 9 hnsecs r=10922666154674544967680
t2=42 ms and 614 ╬╝s r=10922666154674544967680
t3=0 hnsecs r=0
t4=42 ms, 474 ╬╝s, and 8 hnsecs r=10922666154674544967680
C:\content\downloadz\dlang>dmd -release -O -mcpu=avx times2.d
C:\content\downloadz\dlang>times2.exe
t1=141 ms, 263 ╬╝s, and 5 hnsecs r=10922666154673907433000
t2=143 ms, 128 ╬╝s, and 9 hnsecs r=10922666154673907433000
t3=1 hnsec r=0
t4=491 ms, 829 ╬╝s, and 9 hnsecs r=10922666154673907433000
1) different sums DMD and LDC (probably fast-math, dont know)
2) t3=0 for d_with_sum. lets see assembler for LDC (-output-s):
.def _D6times210d_with_sumFNaNbNfAdZd;
.scl 2;
.type 32;
.endef
.section .text,"xr",discard,_D6times210d_with_sumFNaNbNfAdZd
.globl _D6times210d_with_sumFNaNbNfAdZd
.p2align 4, 0x90
_D6times210d_with_sumFNaNbNfAdZd:
vxorps %xmm0, %xmm0, %xmm0
retq // this means "return 0"? cool optimization
3) for Windows better change "╬╝s" to "us" (when
/SUBSYSTEM:CONSOLE)
More information about the Digitalmars-d
mailing list