Attributes (lexical)

Thu Nov 25 12:09:55 UTC 2021

On Thursday, 25 November 2021 at 10:41:05 UTC, Rumbu wrote:
> Well:
>
> ```
> #line IntegerLiteral Filespec? EndOfLine
> ```
>
> Having EndOfLine at the end means for me that there are no 
> other EOLs between, otherwise this syntax should pass but it's 
> not (DMD last):
>
> ```d
> #line 12
> "source.d"
> ```

The lexical grammar section starts with:

> The source text is decoded from its source representation into 
> Unicode Characters. The Characters are further divided into: 
> WhiteSpace, EndOfLine, Comments, SpecialTokenSequences, and 
> Tokens, with the source terminated by an EndOfFile.

What it's failing to mention is how in the lexical grammar rules, 
spaces denote 'immediate concatenation' of the characters/rules 
before and after it, e.g.:
```
DecimalDigits:
     DecimalDigit
     DecimalDigit DecimalDigits
```
`3 1  4` is not a single `IntegerLiteral`, it needs to be `314`.

Now in the parsing grammar, it should mention that spaces denote 
immediate concatenation of *Tokens*, with arbitrary *Comments* 
and *WhiteSpace* inbetween. So the rule:
```
AtAttribute:
     @ nogc
```
Means: an @ token, followed by arbitrary comments and whitespace, 
followed by an identifier token that equals "nogc". That explains 
your first example.

Regarding this lexical rule:
```
#line IntegerLiteral Filespec? EndOfLine
```
This is wrong already from a lexical standpoint, it would suggest 
a SpecialTokenSequence looks like this:
```
#line10"file"
```

The implementation actually looks for a # token, skips 
*WhiteSpace* and *Comment*s, looks for an identifier token 
("line"), and then it goes into a custom loop that allows 
separation by *WhiteSpace* but not *Comment*, and also the first 
'\n' will be assumed to be the final *EndOfLine*, which is why 
this fails:
```
#line 12
"source.d"
```
It thinks it's done after "12".

In conclusion the specification should:
- define the notation used in lexical / parsing grammar blocks
- clearly distinguish lexical / parsing blocks
- fix up the `SpecialTokenSequence` definition (and maybe change 
dmd as well)

By the way, the parsing grammar defines:
```
LinkageType:
     C
     C++
     D
     Windows
     System
     Objective-C
```
C++ and Objective-C cannot be single tokens currently, so they 
are actually 2/3, which is why these are allowed:

```D
extern(C
        ++)
void f() {}

extern(Objective
        -
        C)
void g() {}
```
This should also be fixed in the spec.

> I am not asking this questions out of thin air, I am trying to 
> write a conforming lexer and this is one of the ambiguities.

That's cool! Are you writing an editor plugin?