D needs a type expression syntax
Quirin Schroll
qs.il.paperinik at gmail.com
Thu May 4 15:40:20 UTC 2023
**TL;DR:** If we make `TypeCtor` optional in the production rule
`BasicType` → `TypeCtor`**`(`**`Type`**`)`**, the D type grammar
can be improved and we have a way to solve the 14-year-old [issue
2753](https://issues.dlang.org/show_bug.cgi?id=2753).
---
#### What is an expression syntax?
In D, as in most programming languages, there is the concept of a
*[primary
expression](https://dlang.org/spec/expression.html#primary_expressions),* that, simply put, lets you put an arbitrary expression in parentheses giving you an expression again. Without it, `(a + b) * c` wouldn’t even be expressible.
The scheme is like this:
[`Expression`](https://dlang.org/spec/expression.html#Expression) → [`PrimaryExpression`](https://dlang.org/spec/expression.html#primary_expressions)
[`PrimaryExpression`](https://dlang.org/spec/expression.html#primary_expressions) → **`(`**[`Expression`](https://dlang.org/spec/expression.html#Expression)**`)`**
Imagine you could only use parentheses where they’re needed and
`(a * b) + c` would be an error, since `a * b + c` is in no way
different. This is how D’s types behave. The [type
grammar](https://dlang.org/spec/type.html#grammar) is quite a
mouthful and I reworked it in the past to make it somewhat
understandable for an outsider.
#### D types almost have an expression syntax
There’s one particular interaction that makes D’s types *almost*
have a primary expression:
[`Type`](https://dlang.org/spec/type.html#Type) →
[`BasicType`](https://dlang.org/spec/type.html#BasicType)
[`BasicType`](https://dlang.org/spec/type.html#BasicType) →
[`TypeCtor`](https://dlang.org/spec/type.html#TypeCtor)**`(`**[`Type`](https://dlang.org/spec/type.html#Type)**`)`**
This means, a `Type` can be (among other options) just a
`BasicType`, and a `BasicType` can be (among other options) a
`TypeCtor` followed by a `Type` in parentheses. If we make the
the `TypeCtor` optional, we get first-class type expression
syntax. We should do this *today* and – taking advantage of it –
do even more. (If you have experience with the parser, please let
me know if this would be a difficult change. To me, it doesn’t
seem like it would.)
#### Does it solve anything?
Yes. This isn’t just an academic, puritan, inner-monk-pleasing
exercise. D’s type syntax doesn’t let you express types that are
100 % valid and useful and doesn’t let you clarify your
intentions! Have you ever taken the address of a function that
returns by reference? Normally, the function pointer type is
written the same as a function declaration, just with the
function name replaced by the `function` keyword:
```d
bool isEven (int i) => i % 2 == 0;
bool function(int i) isEvenPtr = &isEven; // ok
ref int refId (ref int i) => i;
ref int function(ref int i) refIdPtr = &refId; // Doesn’t parse!
```
You can declare `refIdPtr` with `auto` because the type of
`&refId` is 100 % well-formed, it’s just a syntax issue spelling
it out in code; if you `pragma(msg, typeof(refIdPtr))` you get:
```d
int function(ref int i) ref
```
Interesting where the `ref` is, isn’t it? Go ahead, try using
*that* instead of `auto`. It doesn’t parse! And frankly, it
shouldn’t; it’s confusing to read.
The reason is that the grammar works by max munch and we don’t
have the type in isolation, it’s part of a declaration. The `ref`
is parsed as a storage class for the declaration: It makes
`refIdPtr` a reference to an object of type `int function(ref
int)` – or, better, it would if it could. In this context,
references aren’t allowed. Additionally, the type and value
category of `&refId` don’t fit the declaration, but the parser
doesn’t even get there.
One way to do it is to use an alias:
```d
alias FP = ref int function(ref int);
FP refIdPtr = &refId;
```
Why, then, does the alias definition of `FP` parse? Essentially
because the alias declaration rules can boil down to this:
[`AliasDeclaration`](https://dlang.org/spec/declaration.html#AliasDeclaration) → **`alias`** *`Identifier`* **`=`** **`ref`** [`Type`](https://dlang.org/spec/type.html#Type)
Simply put, alias declaration rules accept it as a special case.
We can use `auto`, so what’s the deal? The deal is that there are
cases where `auto` cannot be used, e.g. in function parameter
lists. A function with a function pointer parameter of type `FP`
cannot be declared without an alias:
```d
void takesFP(ref int function(int) funcPtr) { pragma(msg,
typeof(funcPtr)); }
```
This compiles, but doesn’t work as intended: The parameter
`funcPtr` is of type `int function(int)` and taken by reference.
Max munch reads `ref` and sees a
[`ParameterStorageClass`](https://dlang.org/spec/function.html#ParameterStorageClass), then it sees the [`Type`](https://dlang.org/spec/type.html#Type) `int function(int)`. That’s perfectly valid and one could want that.
Here’s the catch: We can solve a lot of syntax issues if we not
only make [`TypeCtor`](https://dlang.org/spec/type.html#TypeCtor)
optional (as suggested initially), but also allow `ref` as the
initial part of a [`Type`](https://dlang.org/spec/type.html#Type)
if followed by an appropriate
[`TypeSuffix`](https://dlang.org/spec/type.html#TypeSuffix): the
`function` and `delegate` ones.
([Here](https://github.com/dlang/dlang.org/pull/3446) is the
precise grammar change.)
This means, not only can you put types in parentheses to clarify
your intent, it meaningfully affects parsing:
```d
void takesFP((ref int function(int)) funcPtr) { } // NEW! Doesn’t
work yet.
```
Now, `ref` cannot be interpreted as a parameter storage class! It
must be the first token of a
[`Type`](https://dlang.org/spec/type.html#Type), which
necessitates a function or delegate type, but that’s what we
indeed have.
This also applies to return types:
```d
ref int function(int) makesFPbyRef() { }
(ref int function(int)) makesByRefFP() { }
```
According to max munch parsing, the first function returns an
object of type `int function(int)` by reference, which is a
function pointer that returns by value.
The second function returns an object of type `ref int
function(int)` by value, which is a function pointer that returns
by reference. As soon as the parser sees the opening parenthesis,
it must parse a type.
The first of those should be deprecated in favor of this:
```d
ref (int function(int)) makesFPbyRef() { }
```
The same goes for parameters:
```d
void takesFP(ref int function(int) funcPtr) // Make this an
error …
void takesFP(ref (int function(int)) funcPtr) // … and require
this!
```
This is in the same spirit as [disallowing the nested
lambdas](https://dlang.org/changelog/2.098.0.html#ambiguous-lambda) `=> { }`. Together with that, we should deprecate applying [type constructors](https://dlang.org/spec/type.html#TypeCtor) to function and delegate types without clarification:
```d
const Object function() f0; // Make this an error …
const (Object function()) f1; // … and require this!
(const Object) function() f2; // To differentiate from this.
const(Object) function() f3; // (Same type as f2)
```
We should do the same for type constructors as storage classes
for non-static member function when used in front of the
declaration:
```d
struct S
{
const void action() { } // Make this an error …
void action() const { } // … and require this!
}
```
D requires `ref` on the front, why should we have an issue with
requiring that type constructors go to the end?
#### Are there unintended side effects?
There would be another way to express `const(int)`: `(const
int)`. Because `const(int)` is everywhere, it cannot be
deprecated, and that’s fine. In my opinion, `(const int)` is
better in every regard. A newcomer would probably guess correctly
that `const(int)[]` is a mutable slice of read-only integers, but
it’s no way as clear as `(const int)[]`. If we imagine D some
years in the future, when everyone uses “modern-style types,”
i.e. `(const int)[]`, seeing `const(int)[]` probably looks weird
to you: `const` normally applies to everything that trails it,
but here, because `const` is followed by an opening parenthesis,
it applies precisely to what is in there, nothing more.
More information about the Digitalmars-d
mailing list