Mihaela Chirea - SAOC 2020 Milestone 2 Update 4 - Improving DMD as a Library
Jacob Carlborg
doob at me.com
Fri Nov 20 14:24:14 UTC 2020
On Thursday, 19 November 2020 at 22:07:23 UTC, Mihaela Chirea
wrote:
> - The location should start at the first byte of the AST node
> to preserve the context. Some nodes already had locations
> before I started working on this project, but I noticed that
> not all of them follow this rule. VarDeclaration and
> FuncDeclaration, for example, start at the name of the
> variable/function, skipping the type and the storage classes.
> As these locations are used for errors, I was thinking of
> adding another variable to these classes instead of replacing
> the value of this one.
Yes, I think the simplest would be to add another field. Ideally
the compiler would only store only one field, the actual start of
the declaration, then it would extract the necessary location of
the identifiers if it needs to report an error.
> - In the case of UserAttributeDeclaration, consecutive
> expressions are all put into the same node.
Does that apply to these forms as well:
@uda1
{
int a = 3;
}
@uda2:
int b = 4;
> While the expressions already have their location set, none
> contain the `@` symbol. In a situation like `@uda1 @uda2 @uda3`
> moving this location just a bit to the left wouldn't be a
> problem. However, `@(uda1, uda2, uda3)` is represented in the
> exact same way by the parser: one UserAttributeDeclaration node
> with 3 expressions. So who gets the `@` in this case? My idea
> would be to set the start location of UserAttributeDeclaration
> to the first `@` symbol in the group, and, later, set the end
> location to wherever the last symbol is. The problem with this
> approach is that all attributes, both storage classes and user
> defined, can be applied on the same variable/function, so the
> UserAttributeDeclaration range can contain more than just user
> defined attributes.
First I have to say, I do not fully understand how the AST works
for UDAs. When I tried to solve this, I noticed that
`UserAttributeDeclaration` contains an array of expression. The
natural thing would be to store each UDA in an element of that
array. But that's not how it works, IIRC. Instead the UDAs are
stored in a tuple expression (which itself contains an array of
expressions) as the first element. Then, if enough UDAs are
declared on a declaration it will eventually start using the
second element of the array in `UserAttributeDeclaration`. I
never understood the logic around this.
Yes, that's problematic. I think the what needs to happen is to
add another AST node. I'm thinking one node type that covers all
UDA's for a single declaration, this is what
`UserAttributeDeclaration` is now, as far as I understand. Then
each individual UDA needs to have its own node type as well,
let's call it `UserAttributeDeclarationItem`. The outer node
would not need a location, but the inner nodes would.
For your above examples:
`@uda1 @uda2 @uda3 void foo();`
Would be represented as one `UserAttributeDeclaration`,
containing three `UserAttributeDeclarationItem`. Each
`UserAttributeDeclarationItem` containing one `IdentifierExp`.
`@(uda1, uda2, uda3) nothrow @(uda4, uda5, uda6) void foo();`
Would be represented as one `UserAttributeDeclaration`,
containing two `UserAttributeDeclarationItem`. Each
`UserAttributeDeclarationItem` containing three `IdentifierExp`.
> - A StorageClassDeclaration node is not always created when
> meeting a storage class. Most nodes have a StorageClass field
> where this information is stored in the form of a ulong
> variable, and therefore there is no node to attach its location
> to. For example, `const a = 1;` generates a VarDeclaration node
> with the location at `a` and the storage class with the const
> bit set. Putting consecutive storage classes into the same node
> happens here as well.
Hmm, I see. In my opinion, either the parser needs to be modified
to output a `StorageClassDeclaration` for that variable
declaration, or the location needs to move for the variable
declaration to start from `c`.
> To deal with these situations, I need to know a bit more about
> the desired usecases. What information should we be able to get
> from these nodes?
I can't say in detail of what exactly which information should be
available. I'm thinking more on a high level on which tools need
to be possible to build. For example, I think it's reasonable of
a refactoring tool to be able to automatically change all local
variables to `const`:
auto a = 3;
Would be turned into:
const a = 3;
If `a` is never modified.
As for the UDAs. A refactoring tool that can rename symbols
(these do exist in several languages today):
enum uda1;
@uda1 int a = 3;
@(uda1) void foo();
Say that for the above code, you want to rename the enum. Then
the tool needs to rename all usages of that symbol. In this case,
the UDAs.
> For now, I will put these issues on hold.
Understandable.
--
/Jacob Carlborg
More information about the Digitalmars-d
mailing list