Sum Types - first draft
Paul Backus
snarwin at gmail.com
Tue Sep 10 16:20:55 UTC 2024
On Tuesday, 10 September 2024 at 04:06:16 UTC, Walter Bright
wrote:
> https://github.com/WalterBright/documents/blob/96bca2f9f3520cf53ed5c4dec8e5e2d855e64e66/sumtype.md
### Summary of comments
* Special cases are bad.
* New capabilities should ideally be general-purpose, not
sumtype-specific.
* Sumtype syntax should be modeled after unions, not enums.
### Re: std.sumtype limitations
> * std.sumtype cannot include regular enum members
True, but you can get equivalent semantics using empty structs.
For example, this enum:
enum Foo : ubyte { a, b; }
...could be translated to this SumType:
struct A {}
struct B {}
alias Foo = SumType!(A, B);
Currently, the SumType occupies more storage space than the enum,
because it is forced to allocate 1 byte of storage to give the
empty struct objects a unique address. If D had a feature like
C++'s [[no_unique_address]] attribute [1], these two
representations could be made completely identical.
> * std.sumtype cannot optimize the tag out of existence, for
> example, when having:
>
> enum Option { None, int* Ptr }
A built-in sum type would not be able to do this either, because
in D, every possible sequence of 4 bytes is a potentially-valid
int* value.
The reason Rust is able to perform this optimization is that Rust
has non-nullable reference types [2]. If D had non-nullable
pointer types, then std.sumtype could perform the same
optimization using reflection and `static if`.
> * cannot produce compile time error if not all the arms are
> accounted for in a pattern match rather than a thrown exception
> [...]
> * an int and a pointer cannot both be in a sumtype and be safe
Dennis has already addressed these, and his responses are correct.
### Re: Description
> Member functions of field declarations are restricted the same
> way union member functions are.
> [...]
> Members of sumtypes cannot have copy constructors, postblits,
> or destructors.
std.sumtype does not have these limitations, and having built-in
sumtypes limited like this would be a significant step backwards.
If you want to start with a proof-of-concept -preview
implementation that lacks these features, that's fine--I did the
same with the `sumtype` dub package. Support for members with
postblits was added in v0.5.0, and support for copy constructors
took all the way until v1.0.0. But the DIP should be clear that
these limitations will only be temporary.
> A special case of sumtypes will enable use of non-null pointers.
Unprincipled special cases like this are bad language design.
Non-null pointers are a generally-useful language feature, even
outside of sumtypes. If they're worth doing, they're worth doing
properly.
> A new expression, QueryExpression, is introduced to enable
> querying a sumtype to see if it contains a specified member.
Is this really necessary if we're already planning to add pattern
matching?
> SumTypeBody:
> `{` SumTypeMembers `}`
>
> [...]
>
> sumtype Option(T) { None, Some(T) }
Using enum-style synatx here is a big mistake, IMO. Sumtypes
should use the same AggregateBody syntax as structs and unions.
Advantages of AggregateBody:
* It's amenable to metaprogramming. Inside an AggregateBody, you
can use `static if`, `static foreach`, `mixin`, and so on. With
enum-style syntax, your options are greatly reduced.
* It would allow sumtypes to have user-defined member functions,
including operator overloads. (This is a limitation of
std.sumtype that I have personally received several complaints
about.)
The only disadvantage is that you lose the ability to mix named
integer values (like None, above) with typed members (like
Some(T)).
However, there is a simple solution to this, which is to allow
the programmer to declare fields of type `void`:
sumtype Option(T)
{
void none;
T some;
}
This does not have to be a special-case feature of sumtypes; see
the abandoned "Give unit type semantics to void" DIP [3] for a
detailed description of how this could work as a general language
feature.
> The most pragmatic approach for now is to simply disallow
> taking the address of or a reference to a member of a SumType
> in @safe code.
This is one valid approach. The other is to make writing to a
sumtype value that contains pointers or references @system.
Keep in mind that merely calling a member function of a struct or
class instance requires taking a reference to it, since the
`this` parameter is passed by reference. So this limitation is
actually quite severe.
> But since a subtype with only enum members can be implemented
> as an enum, the compiler should do that rewrite. Similarly, a
> SumType with only one field declaration should be rewritten as
> a struct (and the tag can be omitted). Furthermore, a subtype
> with an enum member with a value of 0 and a field declaration
> that is a pointer can be rewritten as just a pointer.
Again, special cases like this are bad language
design--especially in a language like D with powerful reflection
and metaprogramming.
It's also inconsistent with existing language features. For
example, if I declare a type like this:
union Example { int n; }
...the compiler does not magically rewrite it as a struct, even
though it's functionally equivalent to one.
### References
1. [[no_unique_address]]:
https://en.cppreference.com/w/cpp/language/attributes/no_unique_address
2. Non-nullable references:
https://doc.rust-lang.org/std/primitive.reference.html
3. Give unit type semantics to void:
https://github.com/dkorpel/DIPs/blob/dc1495cc2239729adb270012995c76809fe7f08c/DIPs/DIP1NNN-DK.md
More information about the dip.development
mailing list