Performance test of short-circuiting AliasSeq
Stefan Koch
uplink.coder at googlemail.com
Wed Jun 3 15:18:26 UTC 2020
On Wednesday, 3 June 2020 at 14:52:09 UTC, Stefan Koch wrote:
> The reason the old version of staticMap did not see the
> slowdown is because I didn't disable codegen.
>
> code-gen inefficiencies occurring when emitting an unreasonable
> number to symbols tend to hide other problems.
>
> Here is a Benchmark which does not relay on our branch
> But uses the released dmd 2.092.0
>
> Enjoy!
>
> uplink at uplink-black:~/d/dmd-master/dmd(stable)$ hyperfine
> "./dmd_without_patch sm.d -c -o- -version=Walter"
> "./dmd_with_patch sm.d -c -o- -version=Walter" -m 90
> Benchmark #1: ./dmd_without_patch sm.d -c -o- -version=Walter
> Time (mean ± σ): 452.8 ms ± 7.5 ms [User: 415.5 ms,
> System: 37.4 ms]
> Range (min … max): 442.2 ms … 483.9 ms 90 runs
>
> Benchmark #2: ./dmd_with_patch sm.d -c -o- -version=Walter
> Time (mean ± σ): 455.1 ms ± 10.4 ms [User: 417.3 ms,
> System: 37.7 ms]
> Range (min … max): 441.5 ms … 489.2 ms 90 runs
>
> Summary
> './dmd_without_patch sm.d -c -o- -version=Walter' ran
> 1.00 ± 0.03 times faster than './dmd_with_patch sm.d -c -o-
> -version=Walter'
> uplink at uplink-black:~/d/dmd-master/dmd(stable)$ hyperfine
> "./dmd_without_patch sm.d -c -o-" "./dmd_with_patch sm.d -c
> -o-" -m 90
> Benchmark #1: ./dmd_without_patch sm.d -c -o-
> Time (mean ± σ): 583.2 ms ± 11.0 ms [User: 529.9 ms,
> System: 53.1 ms]
> Range (min … max): 570.0 ms … 631.0 ms 90 runs
>
> Benchmark #2: ./dmd_with_patch sm.d -c -o-
> Time (mean ± σ): 584.3 ms ± 14.3 ms [User: 533.1 ms,
> System: 51.0 ms]
> Range (min … max): 566.5 ms … 657.9 ms 90 runs
>
> Summary
> './dmd_without_patch sm.d -c -o-' ran
> 1.00 ± 0.03 times faster than './dmd_with_patch sm.d -c -o-'
> uplink at uplink-black:~/d/dmd-master/dmd(stable)$ hyperfine
> "./dmd_without_patch sm.d -c -o-" "./dmd_with_patch sm.d -c
> -o-" -m 90
> Benchmark #1: ./dmd_without_patch sm.d -c -o-
> Time (mean ± σ): 583.4 ms ± 10.5 ms [User: 529.2 ms,
> System: 54.0 ms]
> Range (min … max): 566.9 ms … 624.0 ms 90 runs
>
> Benchmark #2: ./dmd_with_patch sm.d -c -o-
> Time (mean ± σ): 585.9 ms ± 13.9 ms [User: 530.5 ms,
> System: 55.2 ms]
> Range (min … max): 565.0 ms … 631.7 ms 90 runs
>
> Summary
> './dmd_without_patch sm.d -c -o-' ran
> 1.00 ± 0.03 times faster than './dmd_with_patch sm.d -c -o-'
Disregard this one.
I had AliasSeq defined as: template AliasSeq(seq...) { enum
AliasSeq = seq; }
Which does not trigger the optimization.
When I however do define AliasSeq as template AliasSeq(seq...) {
alias AliasSeq = seq; }
The optimization triggers and you get:
uplink at uplink-black:~/d/dmd-master/dmd(stable)$ hyperfine
"./dmd_without_patch sm.d -c -o- -version=Walter"
"./dmd_with_patch sm.d -c -o- -version=Walter" -m 50
Benchmark #1: ./dmd_without_patch sm.d -c -o- -version=Walter
Time (mean ± σ): 296.2 ms ± 6.8 ms [User: 263.6 ms,
System: 32.5 ms]
Range (min … max): 285.7 ms … 330.8 ms 50 runs
Benchmark #2: ./dmd_with_patch sm.d -c -o- -version=Walter
Time (mean ± σ): 301.4 ms ± 11.7 ms [User: 270.6 ms,
System: 30.8 ms]
Range (min … max): 285.6 ms … 333.3 ms 50 runs
Summary
'./dmd_without_patch sm.d -c -o- -version=Walter' ran
1.02 ± 0.05 times faster than './dmd_with_patch sm.d -c -o-
-version=Walter'
uplink at uplink-black:~/d/dmd-master/dmd(stable)$ hyperfine
"./dmd_without_patch sm.d -c -o-" "./dmd_with_patch sm.d -c -o-"
-m 50
Benchmark #1: ./dmd_without_patch sm.d -c -o-
Time (mean ± σ): 388.6 ms ± 8.6 ms [User: 346.5 ms,
System: 42.2 ms]
Range (min … max): 378.5 ms … 419.3 ms 50 runs
Benchmark #2: ./dmd_with_patch sm.d -c -o-
Time (mean ± σ): 375.7 ms ± 9.9 ms [User: 332.8 ms,
System: 42.8 ms]
Range (min … max): 362.2 ms … 396.3 ms 50 runs
Summary
'./dmd_with_patch sm.d -c -o-' ran
1.03 ± 0.04 times faster than './dmd_without_patch sm.d -c
-o-'
Which is somewhat consistent with the previous results.
The that I did which does not do the optimization, shows no
measurable difference.
That means that if the optimization does not trigger no
performance penalty in incurred FOR THIS TEST.
Another thing that's surprising is ... somehow applying the patch
does reduce the size of the binary. Which just goes to show that
you really cannot actually tell right from wrong anymore with
modern optimizers.
-rwxrwxr-x 1 uplink uplink 19281504 Jun 3 16:30 dmd_without_patch
-rwxrwxr-x 1 uplink uplink 19279120 Jun 3 16:31 dmd_with_patch
My guess is that llvm's inliner went less crazy because of an
unpredictable branch in there.
More information about the Digitalmars-d
mailing list