Mihaela Chirea - SAOC 2020 Milestone 2 Update 4 - Improving DMD as a Library

Fri Nov 20 14:24:14 UTC 2020

On Thursday, 19 November 2020 at 22:07:23 UTC, Mihaela Chirea 
wrote:

> - The location should start at the first byte of the AST node 
> to preserve the context. Some nodes already had locations 
> before I started working on this project, but I noticed that 
> not all of them follow this rule. VarDeclaration and 
> FuncDeclaration, for example, start at the name of the 
> variable/function, skipping the type and the storage classes. 
> As these locations are used for errors, I was thinking of 
> adding another variable to these classes instead of replacing 
> the value of this one.

Yes, I think the simplest would be to add another field. Ideally 
the compiler would only store only one field, the actual start of 
the declaration, then it would extract the necessary location of 
the identifiers if it needs to report an error.

> - In the case of UserAttributeDeclaration, consecutive 
> expressions are all put into the same node.

Does that apply to these forms as well:

@uda1
{
     int a = 3;
}

@uda2:

int b = 4;

> While the expressions already have their location set, none 
> contain the `@` symbol. In a situation like `@uda1 @uda2 @uda3` 
> moving this location just a bit to the left wouldn't be a 
> problem. However, `@(uda1, uda2, uda3)` is represented in the 
> exact same way by the parser: one UserAttributeDeclaration node 
> with 3 expressions. So who gets the `@` in this case? My idea 
> would be to set the start location of UserAttributeDeclaration 
> to the first `@` symbol in the group, and, later, set the end 
> location to wherever the last symbol is. The problem with this 
> approach is that all attributes, both storage classes and user 
> defined, can be applied on the same variable/function, so the 
> UserAttributeDeclaration range can contain more than just user 
> defined attributes.

First I have to say, I do not fully understand how the AST works 
for UDAs. When I tried to solve this, I noticed that 
`UserAttributeDeclaration` contains an array of expression. The 
natural thing would be to store each UDA in an element of that 
array. But that's not how it works, IIRC. Instead the UDAs are 
stored in a tuple expression (which itself contains an array of 
expressions) as the first element. Then, if enough UDAs are 
declared on a declaration it will eventually start using the 
second element of the array in `UserAttributeDeclaration`. I 
never understood the logic around this.

Yes, that's problematic. I think the what needs to happen is to 
add another AST node. I'm thinking one node type that covers all 
UDA's for a single declaration, this is what 
`UserAttributeDeclaration` is now, as far as I understand. Then 
each individual UDA needs to have its own node type as well, 
let's call it `UserAttributeDeclarationItem`. The outer node 
would not need a location, but the inner nodes would.

For your above examples:

`@uda1 @uda2 @uda3 void foo();`

Would be represented as one `UserAttributeDeclaration`, 
containing three `UserAttributeDeclarationItem`. Each 
`UserAttributeDeclarationItem` containing one `IdentifierExp`.

`@(uda1, uda2, uda3) nothrow @(uda4, uda5, uda6) void foo();`

Would be represented as one `UserAttributeDeclaration`, 
containing two `UserAttributeDeclarationItem`. Each 
`UserAttributeDeclarationItem` containing three `IdentifierExp`.

> - A StorageClassDeclaration node is not always created when 
> meeting a storage class. Most nodes have a StorageClass field 
> where this information is stored in the form of a ulong 
> variable, and therefore there is no node to attach its location 
> to. For example, `const a = 1;` generates a VarDeclaration node 
> with the location at `a` and the storage class with the const 
> bit set. Putting consecutive storage classes into the same node 
> happens here as well.

Hmm, I see. In my opinion, either the parser needs to be modified 
to output a `StorageClassDeclaration` for that variable 
declaration, or the location needs to move for the variable 
declaration to start from `c`.

> To deal with these situations, I need to know a bit more about 
> the desired usecases. What information should we be able to get 
> from these nodes?

I can't say in detail of what exactly which information should be 
available. I'm thinking more on a high level on which tools need 
to be possible to build. For example, I think it's reasonable of 
a refactoring tool to be able to automatically change all local 
variables to `const`:

auto a = 3;

Would be turned into:

const a = 3;

If `a` is never modified.

As for the UDAs. A refactoring tool that can rename symbols 
(these do exist in several languages today):

enum uda1;

@uda1 int a = 3;
@(uda1) void foo();

Say that for the above code, you want to rename the enum. Then 
the tool needs to rename all usages of that symbol. In this case, 
the UDAs.

> For now, I will put these issues on hold.

Understandable.

--
/Jacob Carlborg