Sum Types - first draft

Paul Backus snarwin at gmail.com
Tue Sep 10 16:20:55 UTC 2024


On Tuesday, 10 September 2024 at 04:06:16 UTC, Walter Bright 
wrote:
> https://github.com/WalterBright/documents/blob/96bca2f9f3520cf53ed5c4dec8e5e2d855e64e66/sumtype.md

### Summary of comments

* Special cases are bad.
* New capabilities should ideally be general-purpose, not 
sumtype-specific.
* Sumtype syntax should be modeled after unions, not enums.

### Re: std.sumtype limitations

> * std.sumtype cannot include regular enum members

True, but you can get equivalent semantics using empty structs. 
For example, this enum:

     enum Foo : ubyte { a, b; }

...could be translated to this SumType:

     struct A {}
     struct B {}
     alias Foo = SumType!(A, B);

Currently, the SumType occupies more storage space than the enum, 
because it is forced to allocate 1 byte of storage to give the 
empty struct objects a unique address. If D had a feature like 
C++'s [[no_unique_address]] attribute [1], these two 
representations could be made completely identical.

> * std.sumtype cannot optimize the tag out of existence, for 
> example, when having:
>
>       enum Option { None, int* Ptr }

A built-in sum type would not be able to do this either, because 
in D, every possible sequence of 4 bytes is a potentially-valid 
int* value.

The reason Rust is able to perform this optimization is that Rust 
has non-nullable reference types [2]. If D had non-nullable 
pointer types, then std.sumtype could perform the same 
optimization using reflection and `static if`.

> * cannot produce compile time error if not all the arms are 
> accounted for in a pattern match rather than a thrown exception
> [...]
> * an int and a pointer cannot both be in a sumtype and be safe

Dennis has already addressed these, and his responses are correct.

### Re: Description

> Member functions of field declarations are restricted the same 
> way union member functions are.
> [...]
> Members of sumtypes cannot have copy constructors, postblits, 
> or destructors.

std.sumtype does not have these limitations, and having built-in 
sumtypes limited like this would be a significant step backwards.

If you want to start with a proof-of-concept -preview 
implementation that lacks these features, that's fine--I did the 
same with the `sumtype` dub package. Support for members with 
postblits was added in v0.5.0, and support for copy constructors 
took all the way until v1.0.0. But the DIP should be clear that 
these limitations will only be temporary.

> A special case of sumtypes will enable use of non-null pointers.

Unprincipled special cases like this are bad language design. 
Non-null pointers are a generally-useful language feature, even 
outside of sumtypes. If they're worth doing, they're worth doing 
properly.

> A new expression, QueryExpression, is introduced to enable 
> querying a sumtype to see if it contains a specified member.

Is this really necessary if we're already planning to add pattern 
matching?

>     SumTypeBody:
>        `{` SumTypeMembers `}`
>
> [...]
>
>     sumtype Option(T) { None, Some(T) }

Using enum-style synatx here is a big mistake, IMO. Sumtypes 
should use the same AggregateBody syntax as structs and unions.

Advantages of AggregateBody:

* It's amenable to metaprogramming. Inside an AggregateBody, you 
can use `static if`, `static foreach`, `mixin`, and so on. With 
enum-style syntax, your options are greatly reduced.

* It would allow sumtypes to have user-defined member functions, 
including operator overloads. (This is a limitation of 
std.sumtype that I have personally received several complaints 
about.)

The only disadvantage is that you lose the ability to mix named 
integer values (like None, above) with typed members (like 
Some(T)).

However, there is a simple solution to this, which is to allow 
the programmer to declare fields of type `void`:

     sumtype Option(T)
     {
         void none;
         T some;
     }

This does not have to be a special-case feature of sumtypes; see 
the abandoned "Give unit type semantics to void" DIP [3] for a 
detailed description of how this could work as a general language 
feature.

> The most pragmatic approach for now is to simply disallow 
> taking the address of or a reference to a member of a SumType 
> in @safe code.

This is one valid approach. The other is to make writing to a 
sumtype value that contains pointers or references @system.

Keep in mind that merely calling a member function of a struct or 
class instance requires taking a reference to it, since the 
`this` parameter is passed by reference. So this limitation is 
actually quite severe.

> But since a subtype with only enum members can be implemented 
> as an enum, the compiler should do that rewrite. Similarly, a 
> SumType with only one field declaration should be rewritten as 
> a struct (and the tag can be omitted). Furthermore, a subtype 
> with an enum member with a value of 0 and a field declaration 
> that is a pointer can be rewritten as just a pointer.

Again, special cases like this are bad language 
design--especially in a language like D with powerful reflection 
and metaprogramming.

It's also inconsistent with existing language features. For 
example, if I declare a type like this:

     union Example { int n; }

...the compiler does not magically rewrite it as a struct, even 
though it's functionally equivalent to one.

### References

1. [[no_unique_address]]: 
https://en.cppreference.com/w/cpp/language/attributes/no_unique_address
2. Non-nullable references: 
https://doc.rust-lang.org/std/primitive.reference.html
3. Give unit type semantics to void: 
https://github.com/dkorpel/DIPs/blob/dc1495cc2239729adb270012995c76809fe7f08c/DIPs/DIP1NNN-DK.md


More information about the dip.development mailing list