Function calls overhead
AlbertG
albert.guiman at protonmail.com
Mon Jun 23 18:08:04 UTC 2025
Hi everyone,
I am currently working on the templatization of the casting hooks
in DMD. The goal of runtime hooks templatization was to improve
the runtime performance by moving the type chekcs to compile
time. This has worked great for array-related hooks, but since
most of the casting hooks require runtime type information (e.g.
for performing dynamic casts), the potential gains are much more
limited. However, it still makes sense to templatize them for
consistency and maintainability.
To assess the performance impact of templatization, I have run
some benchmarks. One that caught my attention was the one for
`_d_class_cast`. The benchmarked code is as follows:
```d
class A {}
class B : A {}
class C : B {}
A ac = new C();
for (auto cnt = 0; cnt < 256; ++cnt)
{
B b = cast(B) ac;
}
```
I have measured the time it takes to run this code **100_000**
times, and the results are the following:
- Template vs Non-template - raw:
```bash
============================================================
Testing non-template hook
256 iterations @ 100000 runs: average time = 103.8ms;
std dev = 1.16619
============================================================
Testing template hook
256 iterations @ 100000 runs: average time = 172.3ms;
std dev = 0.9
============================================================
libdruntime.a size: old=16103056 B / new=16297864 B => 1.21%
change
libphobos2.a size: old=55382078 B / new=55637346 B => 0.46% change
libphobos2.so size: old=8586016 B / new=8606224 B => 0.24% change
```
- Template vs Non-template - with `_d_class_cast` inlined, so
without the overhead of the wrapper function:
```bash
============================================================
Testing non-template hook
256 iterations @ 100000 runs: average time = 101.6ms;
std dev = 0.663325
============================================================
Testing template hook
256 iterations @ 100000 runs: average time = 138.6ms;
std dev = 1.2
============================================================
libdruntime.a size: old=16103056 B / new=16231972 B => 0.80%
change
libphobos2.a size: old=55382078 B / new=55552132 B => 0.31% change
libphobos2.so size: old=8586016 B / new=8599512 B => 0.16% change
```
- Template vs Non-template - with both `_d_class_cast` and
`_d_class_cast_impl` inlined, so no extra function calls at all:
```bash
============================================================
Testing non-template hook
256 iterations @ 100000 runs: average time = 103.5ms;
std dev = 1.74643
============================================================
Testing template hook
256 iterations @ 100000 runs: average time = 96.9ms; std dev
= 1.44568
============================================================
libdruntime.a size: old=16103056 B / new=16232328 B => 0.80%
change
libphobos2.a size: old=55382078 B / new=55553686 B => 0.31% change
libphobos2.so size: old=8586016 B / new=8603512 B => 0.20% change
```
As you can see, there is quite a significant overhead when extra
function calls are involved. The size increase is also slightly
lower when inlining is applied, but this might be particular to
druntime/phobos as they may use casts between different types
without too much overlapping that would benefit from no inlining.
Overall, my suggestion is to perform total inlining of casting
functions in the main `_d_cast` hook in order to minimize the
overhead, at the cost of a possible increase in binary size. What
are your thoughts on this? Do you have any better ideas?
More information about the Digitalmars-d
mailing list