draft proposal for Sum Types for D
Quirin Schroll
qs.il.paperinik at gmail.com
Wed Nov 30 14:20:31 UTC 2022
On Tuesday, 29 November 2022 at 06:26:20 UTC, Walter Bright wrote:
> Go ahead, Make My Day! Destroy!
>
> https://github.com/WalterBright/DIPs/blob/sumtypes/DIPs/1NNN-(wgb).md
## Feedback
1. In *Alternative Syntax* the following is not supported by the
grammar you provided:
```d
sumtype Option(T) { None, Some(T) }
```
It should probably be
```d
sumtype Option(T) { None, T Some }
```
2. Maybe you should mention anonymous `sumtype` (cf. anonymous
`struct` and anonymous `union`).
3. As for a keyword, you could circumvent the problem by re-using
existing keywords. `enum union` would be a candidate.
4. “Members of sumtypes cannot have copy constructors, postblits,
or destructors.” Limits its application severely. Andrei had a
talk explaining how destruction of `std::variant` in C++ could be
done efficiently using `static foreach` – if C++ had it. I don’t
see how a copy constructor/postblit is evil.
5. “Sumtypes can be copied *if all its constituent types can be
copied*.”
## Discussion
It seems it tries to be too many things at once and that is
confusing. It tries to be:
1. An algebraic union with a tagged union as its special case
2. An optional type
3. Some form of non-nullable annotation.
Let’s talk about 3. first: The recognition of a magical pattern
is bad; it is backwards and leads to surprises. Were reference
types non-nullable from day one, `enum union { typeof(null), int*
}` would be a great candidate for a nullable pointer to `int`.
The same way `typeof(return)` is special-cased in the grammar
(the token `return` not being an expression), you could at least
special-case `null` (the keyword) as a possible case of a D sum
type. However, this is still a confusing design because it looks
like you’re *adding* a case, but actually it *removes* a value
form the overall type. I’ve heard of negative types in type
theory before, but to be honest, I don’t know much about it.
Better than seemingly *adding* `null` would be to add a
(visually) negative `null`: Allow the pseudo-member `!null` or
`-null`. It’s still weird and maybe even hard to teach, but at
least we can tell people: “The same way you can *add* something
to 2 to make it 1, namely −1, you can *add* `-typeof(null)` to
`int*`, you get an `int*` that cannot be `null`. For your
convenience, instead of `-typeof(null)` you can write `-null`.”
A sum type consisting of at least one reference type (pointers,
classes, AAs) may include `-null` as a special member that makes
the nullable options non-nullable.
We can even add syntax for simple non-nullable reference types:
`int*-null` or `int*\null` or `int*!` or whatever you like.
Likely, it’d be a lowering to a template in object.d named
`nonnull` or something similar.
As for 1. and 2., those make sense. Adding a single, distinct
value to a type makes this an optional type. A template
`optional` in Phobos would become a vocabulary type and makes
people not implement their own optionals and be implemented using
a sum type. This could even be added to object.d with syntax.
I still think that D would be a better and safer language if it
had non-null reference types: In such a hypothetical D, an `int*`
always points to an `int`, but an `int*?` might be `null`. For
value types, something like `int?` can be done in two ways:
Either reserve an otherwise invalid value as the null value
(possibly −2³²) or make `int?` an alias for `enum union
NullableInt { typeof(null), int }` – the first option is a
non-starter because of backward compatibility, but an entirely
new language could do that. User-defined types can have a
compile-time constant `opNull` that tells the compiler: “Not all
combinations of values for my members signify a valid object; use
this as the ‘null’ object instead of adding a boolean tag when
I’m combined with `?`.” It’s probably worth exploring how much of
that can be done and to what cost.
Why all this talk about optional types? Because they play well
with sum types. Asking for a possible variant of a sum type is
inherently returning an optional value. The first step to getting
sum types right is getting optional types right. That doesn’t
mean optional types cannot be a special case of sum types. In
that sense, optional types are the most relevant application of
sum types.
```diff
EnumUnionDeclaration:
`enum` `union` Identifier EnumUnionBody
EnumUnionTemplateDeclaration
+ AnonymousEnumUnionDeclaration
+
+ AnonymousEnumUnionDeclaration:
+ `enum` `union` EnumUnionBody
EnumUnionTemplateDeclaration:
`enum` `union` Identifier TemplateParameters Constraint
(opt) EnumUnionBody
EnumUnionBody:
`{` EnumUnionMembers `}`
EnumUnionMembers:
EnumUnionMember
EnumUnionMember `,`
EnumUnionMember `,` EnumUnionMembers
EnumUnionMember:
+ 'null'
+ '!null'
EnumMemberAttributes EnumUnionMember
EnumMember
FieldDeclaration
EnumMember:
- Identifier
- Identifier = AssignExpression
+ case Identifier
+ case Identifier = AssignExpression
FieldDeclaration:
- Type Identifier
- Type Identifier = AssignExpression
+ Type Identifier(opt)
+ Type Identifier(opt) = AssignExpression
QueryExpression:
`?` PostfixExpression `.` Identifier
```
I mentioned `null` and `!null` above. `null` is a shorthand for
`typeof(null)`. Omitting the `Identifier` is allowed only when
the `EnumMember` is the only one with its type and the
declaration is not anonymous.
Basically we have these categories of sum types:
* Optionals: 1 member (usually unnamed) plus `null`.
* Non-nulls: 1 member (usually unnamed) of reference type minus
`null` (= plus `!null`).
* Commutative: ≥ 2 members (usually unnamed) with different types.
* Homogeneous: ≥ 2 named members with the same type.
* Algebraic: All members named, potentially with repeated types,
potentially self-referential.
* Non-null commutative: ≥ 2 members (usually unnamed) with
different types among which are reference types plus `!null`
* Non-null algebraic: All members named, potentially with
repeated types among which are reference types (potentially
self-referential) plus `!null`.
A member can be an untyped named constant. Those are equivalent
to members of unique unit type. But if types are optional and
identifiers are optional, how do we know which is which? We
don’t. We need something to disambiguate. I went with `case` for
something that’s maybe acceptable. Here, `case` can be read as a
type: “Make a unique case type for this.” The `case` in `case x =
init` is `Case_x` defined as `struct Case_x { typeof(init) value
= init; }`. Because the type is a unit type, its value need not
be stored in the instance.
If all members are `case` members, the type is effectively a
plain-old `enum`. The union (not the tag field) can be elided as
the tag field suffices to retrieve the values. The only
difference to regular `enum` I see is that the DIP proposes that
for the tag field, the smallest available integral type be used,
whereas `enum` by default has `int` values.
I call the (usually unnamed) type-distinguished sum types
commutative because there’s no reason whatsoever to treat `int +
string` and `string + int` differently. There is *the* `int`
member and *the* string member. It’s the same type spelled
differently the same way ordering of type constructors or
function attributes (on e.g. function pointer types) not only
makes no difference, but creates the very same type. One does not
simply reorder a struct, but one can reorder a union and
commutative sum types are close enough. Algebraic sum types are
too much struct-like to be reordered (naming matters, knowing the
type to extract does not suffice).
The member querying syntax is okay, I guess it can be improved.
While `value.member?` reads best, it meaningfully conflicts with
the trinary operator, so the second best would be
`value.?member`. If a sum type is an optional type, i.e. it is
`enum union X {null, T}` for some `T`, it’d be great to have some
C#-esque `?.` and `??` operators as well: `optionalcat?.name`
returns an optional string: `null` if there is no cat and the
cat’s name if there is a cat. This conflicts with the trinary
operator as well, but the `?` followed by module-scope `.`
without a space should be virtually non-existent. For a nullable
value `v`, `v ?? d` returns `v` if it’s not `null` (but typed as
not null) and `d` (default, lazy evaluated) if it is. A `.` could
implicitly be `.?` if the member after it is followed up by `??`
or `?.`, i.e. `value.member?.toString()` is actually
`value.?member?.toString()` and `yourpet.cat ?? mycat` is
actually `yourpet.?cat ?? mycat` because you might not have a cat.
Querying for a commutative sum type would usually be done via the
type: The sum type converts implicitly to an optional of any of
its constituent types:
```d
// object.d
enum union SumType(Ts...) { Ts }
enum union NotNull(Ts...) if (/* any is reference type */) {
!null, Ts }
enum union Nullable(T) if (/* T is not reference type and has no
opNull */) { null, T }
// main.d
StringOrInt = SumType!(string, int);
StringOrInt stringOrInt;
if (string? s = stringOrInt) { }
else (int? i = stringOrInt) { }
else assert(0);
```
Also, `stringOrInt is int` and `stringOrInt !is int` would work.
For an associative array, `aa[key]` can return an optional value
instead of throwing a `RangeError`. That would enable `aa[key]
?? d`; and `aa[key] ??= value` can set the value only if `key` is
not present already.
For a (nullable) pointer `ptr`, the dereference expression `*ptr`
can be treated like an optional value if it is followed by `??`
or `?.`, and ptr?.member
I see one problem with `?.` vs. `.?`, but the rule is easy: The
question mark is where the optional is. For `.?`, the left
operand is a concrete value and the member on the right side
might or might not be present. If optional sum types are a common
thing, maybe we need `?.?`: `lhs?.?member` is: “Give me the
member’s value if `lhs` isn’t `null` and `member` is there;
otherwise give me `null`.”
Sorry for the long post.
More information about the Digitalmars-d
mailing list