D needs a type expression syntax

Quirin Schroll qs.il.paperinik at gmail.com
Wed May 17 15:44:09 UTC 2023


*Note: The `opt`s made the grammar in quotations harder to read 
than necessary and aren’t essential to the quote, so I removed 
them.*

On Saturday, 13 May 2023 at 11:13:59 UTC, Nick Treleaven wrote:
>
> Just noticed we have this form for function literals:
> ```
> function RefOrAutoRef Type ParameterWithAttributes 
> FunctionLiteralBody2
> ```
> https://dlang.org/spec/expression.html#function_literals
>
> But for function types we have just:
> ```
>  TypeCtors BasicType function Parameters FunctionAttributes
> ```
> https://dlang.org/spec/type.html (inlining the *TypeSuffix* 
> form)
>
> It seems we could add this form to *Type* for consistency with 
> function literals:
> ```
> function RefOrAutoRef Type ParameterWithAttributes
> ```

Technically, that would work. I disagree with “for consistency,” 
however. The current consistency is that the order of keyword and 
type make it a type or a literal. Optional elements aside, a 
function literal looks like this:
```d
function int(int x) @safe => x
```
and its type is:
```d
int function(int x) @safe
```
You see, if the keyword (`function` or `delegate`) is first and 
the type is second, it’s a literal (an expression); if the type 
is first and the keyword is second, it’s a type. It doesn’t take 
long to understand this duality, even if it’s not pointed out 
anywhere directly. I find this is really beautiful; it’s an 
elegant solution and easy to read. It’s a bad fortune of miracle 
proportions that C and C++ have a function pointer syntax (for 
C++ also member function pointer syntax) that is much harder to 
come up with _and_ a lot worse to read.

The problem D has is that while
```d
function ref int(ref int x) @safe => x
```
is valid syntax for the literal, the corresponding
```d
ref int function(ref int x) @safe;
```
is not a valid type – except in an `alias` declaration, where it 
is. This is the heart of the issue.

If I understand you correctly, you’d also allow
```d
function int(int x) @safe
```
as a type, that is, when the function doesn’t return by 
reference. (It would be really inconsistent if that wasn’t 
allowed.)

The distinguishing factor then is what follows this whole long 
sequence of tokes. If it’s a brace, `do` or `=>`, then it’s an 
object, otherwise it’s a type, meaning that programmers (and 
likewise the parser) have to read it to the end to know if this 
is a type or an expression.

> So this works:
> `void f(function ref int() g);`

But for the type of a function returning by value, you’d then 
have two syntaxes, right?
```d
function int()
int function()
```
I don’t think this would be a great thing.

> […]
> That would be a smaller impact change than parenthesized types.

Depends on how you measure impact or what you consider small:
* It requires a larger grammar change.
* Thus, it probably requires more code in the parser.
* It solves a very specific problem and introduces niche syntax.
* It breaks with an existing principle.

Considering that `ref int function()` already kind of is a type, 
namely in `alias` declarations, it doesn’t extend the trajectory 
of the current syntax, but goes in an entirely different 
direction.

The smaller-impact solution is to make `ref int function()` a 
first-class type; practically, this isn’t enough because there 
are token sequences that could in principle be read two ways: 
`ref` being a the storage class of a function pointer parameter 
or indicating a function pointer parameter that returns by 
reference. These ambiguities are generally disambiguated by “max 
munch”. As in a lot of other cases, parentheses can disambiguate 
in the alternative direction. To be syntactically able to do so, 
types need primary expression syntax, which they even almost have.

If the syntax like `(const int)` enabled by full-on type 
expression syntax are deemed a problem – which would be weird 
because e.g. `inout(const int)` is already allowed –, to still 
solve the `ref` returning function problem, instead of
 [`BasicType`](https://dlang.org/spec/type.html#BasicType) → 
[`TypeCtor`](https://dlang.org/spec/type.html#TypeCtor)? 
**`(`**[`Type`](https://dlang.org/spec/type.html#Type)**`)`**
we can still do
 [`BasicType`](https://dlang.org/spec/type.html#BasicType) → 
[`TypeCtor`](https://dlang.org/spec/type.html#TypeCtor)? 
**`(`​`ref`** 
[`TypeCtor`](https://dlang.org/spec/type.html#TypeCtor)? 
[`BasicType`](https://dlang.org/spec/type.html#BasicType) 
**`function`** 
[`Parameters`](https://dlang.org/spec/function.html#Parameters) 
[`FunctionAttributes`](https://dlang.org/spec/function.html#FunctionAttributes)**`)`**
 [`BasicType`](https://dlang.org/spec/type.html#BasicType) → 
[`TypeCtor`](https://dlang.org/spec/type.html#TypeCtor)? 
**`(`​`ref`** 
[`TypeCtor`](https://dlang.org/spec/type.html#TypeCtor)? 
[`BasicType`](https://dlang.org/spec/type.html#BasicType) 
**`delegate`** 
[`Parameters`](https://dlang.org/spec/function.html#Parameters) 
[`MemberFunctionAttributes`](https://dlang.org/spec/function.html#MemberFunctionAttributes)**`)`**

I really don’t like this because the `ref` and the parentheses 
then require each other. As I stated earlier, it was as if `(A * 
B) + C` were invalid because the parentheses are redundant. As an 
example of the opposite approach, [Pony 
Lang](https://tutorial.ponylang.io/expressions/ops.html#precedence) has no operator precedence whatsoever, i.e. requires `(A * B) + C`.


More information about the Digitalmars-d mailing list