Third and Hopefully Last Draft: Primary Type Syntax

Mon Sep 23 19:03:47 UTC 2024

On Sunday, 22 September 2024 at 10:58:55 UTC, Tim wrote:
> On Saturday, 21 September 2024 at 01:01:22 UTC, Quirin Schroll 
> wrote:
>> The obligatory 
>> [permalink](https://github.com/Bolpat/DIPs/blob/0562d0c1708f4f8bb79e72392218154ee39b1d4f/DIPs/DIP-2NNN-QFS.md) and [latest draft](https://github.com/Bolpat/DIPs/blob/PrimaryTypeSyntax/DIPs/DIP-2NNN-QFS.md)
>
> The grammar changes look good. I found some new ambiguities, 
> but the implementation seems to always prefer the old meaning, 
> so it should be no problem.

In general, ambiguities are resolved considering Maximum Munch: 
If the next token can be parsed as part of the entity that the 
grammar suggests, it will be; only if it can’t, the entity is 
closed or it’s an error.

> ### Attributes with optional parens
>
> ```d
> // deprecated (size_t) x1 = 1; // Syntax error
> // align (size_t) x2 = 1; // Syntax error
> // package (size_t) x3 = 1; // Syntax error
> // extern (size_t) x4 = 1; // Syntax error
> struct UDA{}
> // @UDA (size_t) x5 = 1; // Syntax error
> ```

Those all fall under Maximum Munch: A parenthesis following any 
of these attributes constitutes their optional arguments. 
Attributes with optional arguments are greedy.

I wasn’t even aware of `align` without argument.

The biggest one is `extern` because it’s realistically used with 
the new parsing. If you have a class `C`, `extern (C)` is 
ambiguous – except for Maximum Munch.

> The attributes `deprecated`, `align`, `package` and `extern` as 
> well as
> UDAs can be followed with optional arguments in parens, like the
> deprecation message. These parens are now ambiguous with a 
> basic type in
> parens.
>
> The implementation seems to always try to parse the parens as 
> arguments
> for the attribute, so it remains backward compatible.

Yes, and it follows MM, which is generally something programmers 
can rely on.

What can be done about those? For one:
```d
attribute
{
     declaration;
}
```
Always works at declaration scope, but for statement scope, 
that’s not possible. Here, I thought one could use an empty UDA 
list `@()`, but those are expressly illegal, so one has to resort 
to using a dummy UDA like `@("")`. Not nice, but if you insist on 
expressing something at statement scope in one swath, I guess we 
can ask the programmer for some concessions.

> Maybe this could be confusing for the user, when a declaration 
> uses a type in parens and later an attribute is added.

There’s unfortunately little that can be done about it. A better 
implementation can possibly backtrack and re-interpret what used 
to be an attribute’s argument as a basic type, but to be honest, 
that is a lot of work.

> ### Scope guards
>
> ```d
> alias exit = Object;
> Object x1;
> void main()
> {
>     scope (exit) x1 = new Object(); // Still a scope guard
>     // scope (Object) x2 = new Object(); // Syntax error
>     // scope (int) x3 = 3; // Syntax error
>     @0 scope (exit) x4 = new Object(); // Declares variable 
> with type exit
> }
> ```

The big issue with these is, basically, that IMO this _must_ work:
```d
scope (ref void function())* fpp = null;
```

And it doesn’t.

> The first statement is a scope guard with the current grammar. 
> With the
> new grammar it could also be a variable declaration of type 
> `exit` and
> storage class `scope`. The implementation still parses it as a 
> scope
> guard, so it remains backward compatible.

IIRC, I ran into this and implemented a look-ahead to handle 
scope guards correctly. The Scope guards utilize magic 
identifiers, and unlike `__traits` or `pragma`, there is no-arg 
`scope`.

> The next line could also be a variable declaration, but it is 
> still
> parsed as a scope guard. DMD then prints an error, because 
> `Object`
> is not a valid scope identifier. The line with `x3` is a syntax 
> error
> for the same reason.

I just fixed that because it was fairly easy to do so. My 
implementation now looks ahead to see if it’s 
`scope(`exit/success/failure`)` and if it’s not, it tries to 
parse it as `scope` attribute.

> The last statement is parsed as a variable declaration, because 
> scope
> guards can't have UDAs.

This is interesting. It’s unlikely that something like that is 
going to be a real-world problem, though, as it requires two 
unlikely things: Someone naming a type `exit` and putting 
parentheses around it and using a UDA on statement scope.

My fix from above doesn’t change that, but again, it’s really 
unlikely to be in code anyways.

> ### Function literals
>
> ```d
> auto test1 = function (float){return 0;};
> // auto test2 = function (float)(int){return 0;}; // Syntax 
> error
> ```
>
> Function literals have an optional return type and optional 
> parameters.
> The type `float` for `test1` could be a parameter or a return 
> type in
> parens. The implementation always parses the parens as 
> parameters,
> so it remains backward compatible.

Yes, for backwards compatibility, it must be done that way. 
However, this is a MM violation and must be mentioned in the DIP.

> The second function literal has both a return type and 
> parameters, but
> it results in a syntax error, because the parens are parsed as
> parameters and no other parens are expected after that.

The second one should be allowed; otherwise some things aren’t 
expressible. This should work because there’s no valid reason why 
it can’t:
```d
auto fp = function (ref int function()) () => null;
```

However, this currently works and must keep behavior:
```d
auto fp = function (ref int function()) => null;
static assert(is(typeof(fp) : typeof(null) function(ref int 
function())));
```

The implementation will do a look-ahead to figure out if it’s 
seeing `(Params) FunctionLiteralBody` or `(Type)(params) 
FunctionLiteralBody`.

It might be noteworthy that this is not a MM violation. There is 
no other way to parse `(Type)(Parameters) FunctionLiteralBody`.

> ### Anonymous classes
>
> ```d
> void main()
> {
>     auto o1 = new class (Object) {};
> }
> ```
>
> The parens could be constructor arguments or a basic type in
> `AnonBaseClassList?`. The implementation always tries to parse
> constructor arguments, which should be fine.

I going to look into this. Probably this is low-priority because 
a base class or interface name following `new class` never 
requires parens. But it should not be an error either. Probably 
I’ll do the same as with function literals: Look ahead and see if 
there’s another set of parens. If yes, it’s `new class 
(Type)(Arguments) {}`. If not, it’s `new class /*implicit 
Object*/(Arguments) {}` because of backwards compatibility.

I’ll commit my stuff probably tomorrow. I can’t do it now, 
unfortunately.