A proposal: Sumtypes

Fri Feb 16 14:34:36 UTC 2024

On Thursday, 8 February 2024 at 15:42:25 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
> Yesterday I mentioned that I wasn't very happy with Walter's 
> design of sum types, at least as per his write-up in his DIP 
> repository.
> I have finally after two years written up an alternative to it, 
> that should cover everything you would expect from such a 
> language feature.
> There are also a couple of key differences with regards to the 
> tag and ABI that will make value type exceptions aka zero cost 
> exceptions work fairly fast.

Thanks for the writeup. I read both DIPs. Honestly, both of them 
need improvement IMO. At present state, I prefer Walter's DIP, 
mainly because the details there are better nailed down.

## Problems in Walter's DIP

We don't want this special case for pointers - or at least it 
needs to be much, much more refined before it carries it's 
weight. If I have `sumtype S { a, int* b }`, `S.a == S.b(null);`, 
right? Well, why doesn't the DIP say the same should happen with 
`sumtype S { a, Object b } `? Even more interesting case, 
`sumtype S { a, b, c, d, bool e}`. A boolean has 254 illegal bit 
patterns - shouldn't they be used for the tag in this case? And 
what happens with `sumtype S {a, int* b, int* c}`? Since we need 
space for a separate tag anyway, does it make sense for null `b` 
to be equal to `a`?

The proposed special case doesn't help much. If one wants a 
pointer and a special null value, one can simply use a pointer. 
On the other hand, one might want a pointer AND a separate tag 
value. To accomplish that, the user will have to either put the 0 
value to the end or do something like `sumtype S {int[0] a, int* 
b}`. Certainly doable, but it's a special case with no good 
reason.

The query expression is not a good idea. This introduces new 
syntax that isn't consistent with rest of the langauge. Instead, 
I propose that each sumtype has a member function `has`, that 
returns a DRuntime-defined nested struct with an opDispatch 
defined for quessing the tag:

```D
sumtype Sum {int a, float b, dchar c}

auto sum = Sum.b(2.5);

assert(!sum.has.a);
assert(sum.has.b);
assert(!sum.has.c);
```

Alternatively, we can settle for simply providing a way for the 
user to get the tag of the sumtype. Then he can use that tag as 
he'd use it in case of a regular enum. In fact we will want to 
provide tag access in any case, because the sum type is otherwise 
too hard to use in `switch` statements.

## Problems in Rikki's DIP

Like Timon said, the types proposed don't seem to know whether 
they are supposed to be an unique type. Consider that any tuple 
can be used to initialise part of another tuple: `Tuple(int, int, 
char, char)` can be initialised with `tuple(5, tuple(10, 
'x').expand, `\n`)`. It makes sense - tuples are defined by their 
contents and beyond that have no identity of their own. However, 
there are excellent reaons why you can't do 
"std.datetime.StopWatch(999l.nullable.expand, 9082l)`. You aren't 
supposed to just declare any random bool and two longs as 
stopwatches just because their internal representation happens to 
be that. Structs are not just names for tuples, they're 
independent types that shouldn't be implicitly mixable unless the 
struct author explicitly declares so.

By saying that a sumtype is always implicitly convertible to 
another sumtype that can structurally hold the same values, 
you're making it the tuple of sumtypes. If the user wants to 
protect the details, he must put it inside a struct or an union. 
But this feels wrong:

```D
struct MySumType
{	sumtype Impl = int a | float b | dchar c;
	Impl impl;
}
```
Why do I need to invent three names for this? If I want to define 
a tuple type that doesn't mix/match freely, I need just one name 
for the struct I use for that.

If you insist on this implicit conversion thing, I propose that 
sum types don't have names by default. Instead, they would become 
part of type declaration syntax. `void` would be the type for 
members with no values beside the tag, and array indexes would be 
used for getting the members:
```D
double | float sumTypeInstance = 3.4;
alias SumTypeMixable = int | float | dchar;
struct SumTypeUnmixable
{	short | wchar | ubyte[2] members;
	alias asShort = members[0];
	alias asWchar = members[1];
	alias asBytePair = members[2];
}
```

Then again, the problem would be that how do you name the members 
this way? Maybe it can work with udas. `double a | float b | :c 
sumTypeInstance` could be rewritten to `@memberName(0, "a") 
@memberName(1, "b") @memberName(2, "c") double | float | void 
sumTypeInstance`. The compiler would check for those udas of the 
symbol when accessing members via name, and also propagate udas 
of an alias to any declaration done using it. I suspect this 
rabbit hole goes a bit too deep though:

```D
alias Type1 = int a | float b;
alias Type2 = int b | float a;

// What would be the member names of this? Sigh.
auto sumtype = [Type1.init, Type2.init];
```
So okay, I don't have very good ideas. Maybe we should just 
require putting the sum type inside another type if naming is 
desired.

There is more I could say, on both of these DIPs but I've used a 
good deal of time on this post already. Maybe I'll do some more 
another time.