Improve the OOP ABI
deadalnix
deadalnix at gmail.com
Thu Sep 28 12:42:04 UTC 2023
Hi all,
I have been arguing for quite some time about the fact that there
is a well of improvement to make to D that are not necessarily
breaking changes or languages addition - and that, in fact, D
having more feature than most language, adding feature is clearly
not where the impact is.
Today I would like to provide a concrete example of things that
could be dramatically improved, with no breakage except at the
ABI level - which we already do not guarantee. Let's go over
these changes.
1/ Downcast to final classes.
```d
class A {}
final class B : A {}
auto foo(A a) {
return cast(B) a;
}
```
This generates a call into the runtime, making things completely
opaque to the optimizer. The algorithm used by the runtime is
linear. But there is a dumb constant time solution. because B is
final, a is either an instance of B, or it is not (as in, it
cannot be an instance of a subclass of B). Therefore, we expect
the codegen to look like the following instead:
```d
auto foo(A a) {
return typeid(a) is typeid(B) ? a : null;
}
```
This obviously won't compile but you get the idea. Not only this
is constant time, but very simple too, and visible to the
optimizer, which means the check can be folded by the opitimizer
after other transforms, for instance inlining.
2/ Inline ClassInfo with the vtbl.
The current layout of objects is as follow:
```
+-----------+ +-----------+ +-----------+
| __vtbl +--->| typeid +--->| ClassInfo |
+-----------+ +-----------+ | |
| __monitor | | method0 | | ... |
+-----------+ +-----------+ | |
| field0 | | method1 | +-----------+
+-----------+ +-----------+
| field1 | | ... |
+-----------+ +-----------+
| ... |
+-----------+
```
This causes a ton of extra indirections that are not necessary.
instead the following layout ought to be used:
```
+-----------+ +-----------+
| __vtbl +--->| ClassInfo |
+-----------+ | |
| __monitor | | ... |
+-----------+ | |
| field0 | +-----------+
+-----------+ | method0 |
| field1 | +-----------+
+-----------+ | method1 |
| ... | +-----------+
+-----------+ | ... |
+-----------+
```
Alternatively, it is possible to have the pointer point at the
first method and subtract to get the typeid. The important part
is that there is only one indirection now.
3/ Downcast.
Currently, we use a linear algorithm to downcast and we reach
this algorithm via a call int he runtime. This prevent the
optimizer from doing anything about it, in addition to be slow.
The problem is similar to 1/. The whole check can be made trivial
if we let the compiler prebake some data for us, namely an array
of primary parents. Let's see what it looks like in the first
example, when b is not final.
```d
class A {}
class B : A {}
auto foo(A a) {
return cast(B) a;
}
```
Which we can do as follow:
```d
auto foo(A a) {
// The primaries generated by the compiler for us.
auto tidA = typeid(A);
assert(tidA.primaries = [tidA]);
auto tidB = typeid(B);
assert(tidB.primaries = [tidA, tidB]);
auto t = typeid(a);
auto depth = tidB.primaries.length - 1;
if (t.primary.length <= depth) {
return null;
}
if (t.primary[depth] == tidB) {
return a;
}
return null;
}
```
This is starting to look good, now we have general downcast in a
form that is not only really fast to perform, but also completely
transparent to the optimizer.
4/ Downcast to interfaces.
This is getting a bit more tricky, so I won't expand this one
fully, but it is also possible to do much better on this front as
well. The obvious complication that that interfaces allow for
multiple inheritance.
The first obvious optimization is to consider the chain of
leftmost (or alternatively the longest chain, which allows to
cull more candidates faster) inheritance the primaries for the
interface and eliminate the whole branch at once using the same
strategy as 3/.
Now we are left with possible secondaries match. Here, no
possibility to remain O(1), we'll have to loop over the other
interfaces and repeat the matching process. Thankfully very
branch hierarchy are uncommon in practice, but even then, we can
use this one weird trick to dramatically cull out the search
space: bloom filters.
Make the bloom filter 64 bits and simply cull via `(targetBloom &
aggregateBloom) == targetBloom` and you can eliminate a good
chunk of the search right there.
I hope this is the kind of change we can see in D in the future.
This is the kind of changes that would make D better in practice,
not the next cool feature, but that the current, existing
feature, have an quality of implementation that is in line with
what can be expected in a modern language.
More information about the Digitalmars-d
mailing list