Performance test of short-circuiting AliasSeq

Wed Jun 3 15:18:26 UTC 2020

On Wednesday, 3 June 2020 at 14:52:09 UTC, Stefan Koch wrote:
> The reason the old version of staticMap did not see the 
> slowdown is because I didn't disable codegen.
>
> code-gen inefficiencies occurring when emitting an unreasonable 
> number to symbols tend to hide other problems.
>
> Here is a Benchmark which does not relay on our branch
> But uses the released dmd 2.092.0
>
> Enjoy!
>
> uplink at uplink-black:~/d/dmd-master/dmd(stable)$ hyperfine 
> "./dmd_without_patch sm.d -c -o- -version=Walter" 
> "./dmd_with_patch sm.d -c -o- -version=Walter" -m 90
> Benchmark #1: ./dmd_without_patch sm.d -c -o- -version=Walter
>   Time (mean ± σ):     452.8 ms ±   7.5 ms    [User: 415.5 ms, 
> System: 37.4 ms]
>   Range (min … max):   442.2 ms … 483.9 ms    90 runs
>
> Benchmark #2: ./dmd_with_patch sm.d -c -o- -version=Walter
>   Time (mean ± σ):     455.1 ms ±  10.4 ms    [User: 417.3 ms, 
> System: 37.7 ms]
>   Range (min … max):   441.5 ms … 489.2 ms    90 runs
>
> Summary
>   './dmd_without_patch sm.d -c -o- -version=Walter' ran
>     1.00 ± 0.03 times faster than './dmd_with_patch sm.d -c -o- 
> -version=Walter'
> uplink at uplink-black:~/d/dmd-master/dmd(stable)$ hyperfine 
> "./dmd_without_patch sm.d -c -o-" "./dmd_with_patch sm.d -c 
> -o-" -m 90
> Benchmark #1: ./dmd_without_patch sm.d -c -o-
>   Time (mean ± σ):     583.2 ms ±  11.0 ms    [User: 529.9 ms, 
> System: 53.1 ms]
>   Range (min … max):   570.0 ms … 631.0 ms    90 runs
>
> Benchmark #2: ./dmd_with_patch sm.d -c -o-
>   Time (mean ± σ):     584.3 ms ±  14.3 ms    [User: 533.1 ms, 
> System: 51.0 ms]
>   Range (min … max):   566.5 ms … 657.9 ms    90 runs
>
> Summary
>   './dmd_without_patch sm.d -c -o-' ran
>     1.00 ± 0.03 times faster than './dmd_with_patch sm.d -c -o-'
> uplink at uplink-black:~/d/dmd-master/dmd(stable)$ hyperfine 
> "./dmd_without_patch sm.d -c -o-" "./dmd_with_patch sm.d -c 
> -o-" -m 90
> Benchmark #1: ./dmd_without_patch sm.d -c -o-
>   Time (mean ± σ):     583.4 ms ±  10.5 ms    [User: 529.2 ms, 
> System: 54.0 ms]
>   Range (min … max):   566.9 ms … 624.0 ms    90 runs
>
> Benchmark #2: ./dmd_with_patch sm.d -c -o-
>   Time (mean ± σ):     585.9 ms ±  13.9 ms    [User: 530.5 ms, 
> System: 55.2 ms]
>   Range (min … max):   565.0 ms … 631.7 ms    90 runs
>
> Summary
>   './dmd_without_patch sm.d -c -o-' ran
>     1.00 ± 0.03 times faster than './dmd_with_patch sm.d -c -o-'

Disregard this one.
I had AliasSeq defined as: template AliasSeq(seq...) { enum 
AliasSeq = seq; }
Which does not trigger the optimization.

When I however do define AliasSeq as template AliasSeq(seq...) { 
alias AliasSeq = seq; }

The optimization triggers and you get:

uplink at uplink-black:~/d/dmd-master/dmd(stable)$ hyperfine 
"./dmd_without_patch sm.d -c -o- -version=Walter" 
"./dmd_with_patch sm.d -c -o- -version=Walter" -m 50
Benchmark #1: ./dmd_without_patch sm.d -c -o- -version=Walter
   Time (mean ± σ):     296.2 ms ±   6.8 ms    [User: 263.6 ms, 
System: 32.5 ms]
   Range (min … max):   285.7 ms … 330.8 ms    50 runs

Benchmark #2: ./dmd_with_patch sm.d -c -o- -version=Walter
   Time (mean ± σ):     301.4 ms ±  11.7 ms    [User: 270.6 ms, 
System: 30.8 ms]
   Range (min … max):   285.6 ms … 333.3 ms    50 runs

Summary
   './dmd_without_patch sm.d -c -o- -version=Walter' ran
     1.02 ± 0.05 times faster than './dmd_with_patch sm.d -c -o- 
-version=Walter'
uplink at uplink-black:~/d/dmd-master/dmd(stable)$ hyperfine 
"./dmd_without_patch sm.d -c -o-" "./dmd_with_patch sm.d -c -o-" 
-m 50
Benchmark #1: ./dmd_without_patch sm.d -c -o-
   Time (mean ± σ):     388.6 ms ±   8.6 ms    [User: 346.5 ms, 
System: 42.2 ms]
   Range (min … max):   378.5 ms … 419.3 ms    50 runs

Benchmark #2: ./dmd_with_patch sm.d -c -o-
   Time (mean ± σ):     375.7 ms ±   9.9 ms    [User: 332.8 ms, 
System: 42.8 ms]
   Range (min … max):   362.2 ms … 396.3 ms    50 runs

Summary
   './dmd_with_patch sm.d -c -o-' ran
     1.03 ± 0.04 times faster than './dmd_without_patch sm.d -c 
-o-'

Which is somewhat consistent with the previous results.
The that I did which does not do the optimization, shows no 
measurable difference.
That means that if the optimization does not trigger no 
performance penalty in incurred FOR THIS TEST.

Another thing that's surprising is ... somehow applying the patch 
does reduce the size of the binary. Which just goes to show that 
you really cannot actually tell right from wrong anymore with 
modern optimizers.

-rwxrwxr-x 1 uplink uplink 19281504 Jun  3 16:30 dmd_without_patch
-rwxrwxr-x 1 uplink uplink 19279120 Jun  3 16:31 dmd_with_patch

My guess is that llvm's inliner went less crazy because of an 
unpredictable branch in there.