D needs a type expression syntax

Quirin Schroll qs.il.paperinik at gmail.com
Thu May 4 15:40:20 UTC 2023


**TL;DR:** If we make `TypeCtor` optional in the production rule 
`BasicType` → `TypeCtor`**`(`**`Type`**`)`**, the D type grammar 
can be improved and we have a way to solve the 14-year-old [issue 
2753](https://issues.dlang.org/show_bug.cgi?id=2753).

---

#### What is an expression syntax?

In D, as in most programming languages, there is the concept of a 
*[primary 
expression](https://dlang.org/spec/expression.html#primary_expressions),* that, simply put, lets you put an arbitrary expression in parentheses giving you an expression again. Without it, `(a + b) * c` wouldn’t even be expressible.

The scheme is like this:
 [`Expression`](https://dlang.org/spec/expression.html#Expression) → [`PrimaryExpression`](https://dlang.org/spec/expression.html#primary_expressions)
 [`PrimaryExpression`](https://dlang.org/spec/expression.html#primary_expressions) → **`(`**[`Expression`](https://dlang.org/spec/expression.html#Expression)**`)`**

Imagine you could only use parentheses where they’re needed and 
`(a * b) + c` would be an error, since `a * b + c` is in no way 
different. This is how D’s types behave. The [type 
grammar](https://dlang.org/spec/type.html#grammar) is quite a 
mouthful and I reworked it in the past to make it somewhat 
understandable for an outsider.

#### D types almost have an expression syntax

There’s one particular interaction that makes D’s types *almost* 
have a primary expression:
 [`Type`](https://dlang.org/spec/type.html#Type) → 
[`BasicType`](https://dlang.org/spec/type.html#BasicType)
 [`BasicType`](https://dlang.org/spec/type.html#BasicType) → 
[`TypeCtor`](https://dlang.org/spec/type.html#TypeCtor)**`(`**[`Type`](https://dlang.org/spec/type.html#Type)**`)`**

This means, a `Type` can be (among other options) just a 
`BasicType`, and a `BasicType` can be (among other options) a 
`TypeCtor` followed by a `Type` in parentheses. If we make the 
the `TypeCtor` optional, we get first-class type expression 
syntax. We should do this *today* and – taking advantage of it – 
do even more. (If you have experience with the parser, please let 
me know if this would be a difficult change. To me, it doesn’t 
seem like it would.)

#### Does it solve anything?

Yes. This isn’t just an academic, puritan, inner-monk-pleasing 
exercise. D’s type syntax doesn’t let you express types that are 
100 % valid and useful and doesn’t let you clarify your 
intentions! Have you ever taken the address of a function that 
returns by reference? Normally, the function pointer type is 
written the same as a function declaration, just with the 
function name replaced by the `function` keyword:
```d
bool isEven  (int i) => i % 2 == 0;
bool function(int i) isEvenPtr = &isEven; // ok

ref int refId   (ref int i) => i;
ref int function(ref int i) refIdPtr = &refId; // Doesn’t parse!
```
You can declare `refIdPtr` with `auto` because the type of 
`&refId` is 100 % well-formed, it’s just a syntax issue spelling 
it out in code; if you `pragma(msg, typeof(refIdPtr))` you get:
```d
int function(ref int i) ref
```
Interesting where the `ref` is, isn’t it? Go ahead, try using 
*that* instead of `auto`. It doesn’t parse! And frankly, it 
shouldn’t; it’s confusing to read.

The reason is that the grammar works by max munch and we don’t 
have the type in isolation, it’s part of a declaration. The `ref` 
is parsed as a storage class for the declaration: It makes 
`refIdPtr` a reference to an object of type `int function(ref 
int)` – or, better, it would if it could. In this context, 
references aren’t allowed. Additionally, the type and value 
category of `&refId` don’t fit the declaration, but the parser 
doesn’t even get there.

One way to do it is to use an alias:
```d
alias FP = ref int function(ref int);
FP refIdPtr = &refId;
```
Why, then, does the alias definition of `FP` parse? Essentially 
because the alias declaration rules can boil down to this:
[`AliasDeclaration`](https://dlang.org/spec/declaration.html#AliasDeclaration) → **`alias`** *`Identifier`* **`=`** **`ref`** [`Type`](https://dlang.org/spec/type.html#Type)
Simply put, alias declaration rules accept it as a special case.

We can use `auto`, so what’s the deal? The deal is that there are 
cases where `auto` cannot be used, e.g. in function parameter 
lists. A function with a function pointer parameter of type `FP` 
cannot be declared without an alias:
```d
void takesFP(ref int function(int) funcPtr) { pragma(msg, 
typeof(funcPtr)); }
```
This compiles, but doesn’t work as intended: The parameter 
`funcPtr` is of type `int function(int)` and taken by reference. 
Max munch reads `ref` and sees a 
[`ParameterStorageClass`](https://dlang.org/spec/function.html#ParameterStorageClass), then it sees the [`Type`](https://dlang.org/spec/type.html#Type) `int function(int)`. That’s perfectly valid and one could want that.

Here’s the catch: We can solve a lot of syntax issues if we not 
only make [`TypeCtor`](https://dlang.org/spec/type.html#TypeCtor) 
optional (as suggested initially), but also allow `ref` as the 
initial part of a [`Type`](https://dlang.org/spec/type.html#Type) 
if followed by an appropriate 
[`TypeSuffix`](https://dlang.org/spec/type.html#TypeSuffix): the 
`function` and `delegate` ones. 
([Here](https://github.com/dlang/dlang.org/pull/3446) is the 
precise grammar change.)

This means, not only can you put types in parentheses to clarify 
your intent, it meaningfully affects parsing:
```d
void takesFP((ref int function(int)) funcPtr) { } // NEW! Doesn’t 
work yet.
```
Now, `ref` cannot be interpreted as a parameter storage class! It 
must be the first token of a 
[`Type`](https://dlang.org/spec/type.html#Type), which 
necessitates a function or delegate type, but that’s what we 
indeed have.

This also applies to return types:
```d
  ref int function(int)  makesFPbyRef() { }
(ref int function(int)) makesByRefFP() { }
```
According to max munch parsing, the first function returns an 
object of type `int function(int)` by reference, which is a 
function pointer that returns by value.
The second function returns an object of type `ref int 
function(int)` by value, which is a function pointer that returns 
by reference. As soon as the parser sees the opening parenthesis, 
it must parse a type.

The first of those should be deprecated in favor of this:
```d
ref (int function(int)) makesFPbyRef() { }
```

The same goes for parameters:
```d
void takesFP(ref  int function(int)  funcPtr) // Make this an 
error …
void takesFP(ref (int function(int)) funcPtr) // … and require 
this!
```

This is in the same spirit as [disallowing the nested 
lambdas](https://dlang.org/changelog/2.098.0.html#ambiguous-lambda) `=> { }`. Together with that, we should deprecate applying [type constructors](https://dlang.org/spec/type.html#TypeCtor) to function and delegate types without clarification:
```d
  const Object  function()  f0; // Make this an error …
const (Object  function()) f1; // … and require this!
(const Object) function()  f2; // To differentiate from this.
  const(Object) function()  f3; // (Same type as f2)
```
We should do the same for type constructors as storage classes 
for non-static member function when used in front of the 
declaration:
```d
struct S
{
     const void action() { } // Make this an error …
     void action() const { } // … and require this!
}
```
D requires `ref` on the front, why should we have an issue with 
requiring that type constructors go to the end?

#### Are there unintended side effects?

There would be another way to express `const(int)`: `(const 
int)`. Because `const(int)` is everywhere, it cannot be 
deprecated, and that’s fine. In my opinion, `(const int)` is 
better in every regard. A newcomer would probably guess correctly 
that `const(int)[]` is a mutable slice of read-only integers, but 
it’s no way as clear as `(const int)[]`. If we imagine D some 
years in the future, when everyone uses “modern-style types,” 
i.e. `(const int)[]`, seeing `const(int)[]` probably looks weird 
to you: `const` normally applies to everything that trails it, 
but here, because `const` is followed by an opening parenthesis, 
it applies precisely to what is in there, nothing more.


More information about the Digitalmars-d mailing list