second draft: add Bitfields to D

Sun Apr 28 13:32:44 UTC 2024

On Sunday, April 28, 2024 12:44:41 AM MDT Walter Bright via dip.development 
wrote:
> On 4/27/2024 12:12 AM, Jonathan M Davis wrote:
> > Now, if we want to do something like have extern(C) bitfields and
> > extern(D)
> > bitfields so that we can have clean and consistent behavior in normal D
> > code
>
> D used to have its own function call ABI, because I thought I'd make a clean
> and consistent one.
>
> It turned out, nobody cared about clean and consistent. They wanted C
> compatibility. For example, debuggers could not handle anything other than
> what the associated C compiler emitted, regardless of what the debug info
> spec says.
>
> There really is not a clean and consistent layout. There is only C
> compatibility. Just like we do for endianess and alignment.
>
> All of the portability issues people have mentioned are easily dealt with.
>
> There is always writing functions that do shifts and masks as a last resort.
> (Shifts and masks is what the code generator does anyway, so this won't
> cost any performance.)

In this particular case, as I understand it, there are use cases that
definitely need to be able to have a guaranteed bit layout (e.g.
serialization code). So, I don't think that this is quite the same situation
as something like the call ABI. Even if a particular call ABI might
theoretically be better, it's not something that code normally cares about
in practice so long as it works, whereas some code will actually care what
the exact layout of bitfields is. The call ABI is largely a language
implementation detail, whereas the layout of bitfields actually affects the
behavior of the code.

It seems to me that we're dealing with three use cases here:

1. Code that is specifically binding to C bitfields. It needs to match what
the C compiler does, or it won't work. That comes with whatever pros and
cons the C layout has, but since the D code needs to match the C layout to
work, we just have to deal with whatever the layout is, and realistically,
the D code using it should not be written to care what the layout is,
because it could differ across OSes and architectures.

2. Code that needs a guaranteed bit layout, because it's actually taking the
integers that the bitfields are compacted into and storing them elsewhere
(e.g. storing the data on disk or sending it across the network). What C
does with bitfields for such code is utterly irrelevant, and it's
undesirable to even attempt compatibility. The bits need to be laid out
precisely in the way that the programmer indicates.

3. Code that just wants to store bits in a compact manner, and how that's
done doesn't particularly matter as long as the code just operates on the
individual bitfields and doesn't actually do anything with the integer
values that they're compacted into where the layout would matter.

For the third use case, it's arguably the case that we'd be better off with
a guaranteed bit layout so that it would be consistent across OSes and
architectures, and anyone who accidentally wrote code that relied on the bit
layout wouldn't have issues as a result (similar to how we make it so that
long is guaranteed to be 64 bits across OSes and architectures regardless of
what C does; we avoid whole classes of bugs that way). If I understand
correctly, it's the issues that come from accidentally relying on the exact
bit layout when it's not guaranteed which are why folks like Steven are
arguing that it's a terrible idea to follow C's layout.

However, it's also true that since such code in theory doesn't care what the
bit layout is (since it's just using bitfields for compact storage and not
for something like serialization), the third use case could be solved with
either C-compatible bitfields or with bitfields which have a guaranteed
layout. It would be less error-prone (and thus more desirable) if the bit
layout were consistent, but as long as code doesn't accidentally depend on
the layout, it shouldn't matter.

So, use case #3 could be solved with either C-compatible bitfields or
bitfields with a guaranted layout. However, use cases #1 and #2 are
completely incompatible, and we therefore need separate solutions for them.

For C compatibility, the obvious solution is to have the compiler deal with
it like this DIP is doing. It already has to deal with C compatibility for a
variety of things, and it's just going to be far easier and cleaner to have
the compiler set up to provide C-compatible bitfields than it is to try to
provide a library solution. I wouldn't expect a library solution to cover
all of the possible targets correctly, whereas it should be much more
straightforward for the compiler to do it.

The issue then is what to do about use case #2, where the bit layout needs
to be guaranteed.

I get the impression that you favor leaving the guaranteed bit layout to a
library solution, since you don't think that that use case matters much,
whereas you think that C compatibility matters a great deal, and you don't
think that the issues with accidentally relying on the layout when it's not
guaranteed are a big enough concern to avoid using C bitfields for code that
just wants to compact the bits. On the other hand, a number of the folks in
this thread don't think that C compatibility matters and don't want the bugs
that come from accidentally relying on the bit layout when it's not
guaranteed, so they're arguing for just making our bitfields have a
guaranteed layout and not worrying about C.

Personally, I'm inclined to argue that it would just be better to treat this
like we do extern(C++). extern(C++) structs and classes have whatever tweaks
are necessary to make them work with C++, whereas extern(D) code does what
we want to do with D types. We can do the same with extern(C) bitfields and
extern(D) bitfields. That way, we get C compatibility for the code that
needs it and a guaranteed bit layout for the code that needs that. And since
the guaranteed layout would be the default, we'd largely avoid bugs related
to relying on the bit layout when it's not guaranteed. It would be like how
D code in general uses long rather than c_long, so normal D code can rely on
the size of long and avoid the bugs that come with the type's size varying
depending on the target, whereas the code that actually needs C
compatibility uses c_long and takes the risks that come with a variable
integer size, because it has to. The issues with C bitfields would be
restricted to the code that actually needs the compatibility. It would also
make it cleaner to write code that has a guaranteed bit layout than it would
be a with a library solution, since it could use the nice syntax too rather
than treating it as a second-class citizen.

However, in terms of what's actually necessary, I think that realistically,
extern(C) bitfields need to be in the language like this DIP is proposing,
since it's just too risky to do that with a library solution, whereas
extern(D) bitfields _can_ be solved with a library solution like they are
right now. I don't think that that's the best solution, but it's certainly
better than what we have right now, since we don't have C-compatible
bitfields anywhere at the moment (outside of a preview switch).

In any case, it seems like the core issue that's resulting in most of the
debate over this DIP is how important some people think that it is to have a
guaranteed bit layout by default so that bugs which come from relying on a
layout that isn't guaranteed will be avoided. You don't seem to think that
that's much of a concern, whereas some of the other folks think that it's a
big concern.

Either way, I completely agree that we need a C-compatible solution in the
language so that we can sanely bind to C code that uses bitfields.

- Jonathan M Davis