Improve the OOP ABI

deadalnix deadalnix at gmail.com
Thu Sep 28 12:42:04 UTC 2023


Hi all,

I have been arguing for quite some time about the fact that there 
is a well of improvement to make to D that are not necessarily 
breaking changes or languages addition - and that, in fact, D 
having more feature than most language, adding feature is clearly 
not where the impact is.

Today I would like to provide a concrete example of things that 
could be dramatically improved, with no breakage except at the 
ABI level - which we already do not guarantee. Let's go over 
these changes.

1/ Downcast to final classes.

```d
class A {}
final class B : A {}

auto foo(A a) {
     return cast(B) a;
}
```

This generates a call into the runtime, making things completely 
opaque to the optimizer. The algorithm used by the runtime is 
linear. But there is a dumb constant time solution. because B is 
final, a is either an instance of B, or it is not (as in, it 
cannot be an instance of a subclass of B). Therefore, we expect 
the codegen to look like the following instead:

```d
auto foo(A a) {
     return typeid(a) is typeid(B) ? a : null;
}
```

This obviously won't compile but you get the idea. Not only this 
is constant time, but very simple too, and visible to the 
optimizer, which means the check can be folded by the opitimizer 
after other transforms, for instance inlining.

2/ Inline ClassInfo with the vtbl.

The current layout of objects is as follow:

```
+-----------+    +-----------+    +-----------+
|  __vtbl   +--->|  typeid   +--->| ClassInfo |
+-----------+    +-----------+    |           |
| __monitor |    |  method0  |    |    ...    |
+-----------+    +-----------+    |           |
|  field0   |    |  method1  |    +-----------+
+-----------+    +-----------+
|  field1   |    |    ...    |
+-----------+    +-----------+
|    ...    |
+-----------+
```

This causes a ton of extra indirections that are not necessary. 
instead the following layout ought to be used:

```
+-----------+    +-----------+
|  __vtbl   +--->| ClassInfo |
+-----------+    |           |
| __monitor |    |    ...    |
+-----------+    |           |
|  field0   |    +-----------+
+-----------+    |  method0  |
|  field1   |    +-----------+
+-----------+    |  method1  |
|    ...    |    +-----------+
+-----------+    |    ...    |
                  +-----------+
```

Alternatively, it is possible to have the pointer point at the 
first method and subtract to get the typeid. The important part 
is that there is only one indirection now.

3/ Downcast.

Currently, we use a linear algorithm to downcast and we reach 
this algorithm via a call int he runtime. This prevent the 
optimizer from doing anything about it, in addition to be slow. 
The problem is similar to 1/. The whole check can be made trivial 
if we let the compiler prebake some data for us, namely an array 
of primary parents. Let's see what it looks like in the first 
example, when b is not final.

```d
class A {}
class B : A {}

auto foo(A a) {
     return cast(B) a;
}
```

Which we can do as follow:

```d
auto foo(A a) {
     // The primaries generated by the compiler for us.
     auto tidA = typeid(A);
     assert(tidA.primaries = [tidA]);
     auto tidB = typeid(B);
     assert(tidB.primaries = [tidA, tidB]);

     auto t = typeid(a);
     auto depth = tidB.primaries.length - 1;

     if (t.primary.length <= depth) {
         return null;
     }

     if (t.primary[depth] == tidB) {
         return a;
     }

     return null;
}
```

This is starting to look good, now we have general downcast in a 
form that is not only really fast to perform, but also completely 
transparent to the optimizer.

4/ Downcast to interfaces.

This is getting a bit more tricky, so I won't expand this one 
fully, but it is also possible to do much better on this front as 
well. The obvious complication that that interfaces allow for 
multiple inheritance.

The first obvious optimization is to consider the chain of 
leftmost (or alternatively the longest chain, which allows to 
cull more candidates faster) inheritance the primaries for the 
interface and eliminate the whole branch at once using the same 
strategy as 3/.

Now we are left with possible secondaries match. Here, no 
possibility to remain O(1), we'll have to loop over the other 
interfaces and repeat the matching process. Thankfully very 
branch hierarchy are uncommon in practice, but even then, we can 
use this one weird trick to dramatically cull out the search 
space: bloom filters.

Make the bloom filter 64 bits and simply cull via `(targetBloom & 
aggregateBloom) == targetBloom` and you can eliminate a good 
chunk of the search right there.

I hope this is the kind of change we can see in D in the future. 
This is the kind of changes that would make D better in practice, 
not the next cool feature, but that the current, existing 
feature, have an quality of implementation that is in line with 
what can be expected in a modern language.


More information about the Digitalmars-d mailing list