DLF September 2023 Monthly Meeting Summary

Mon Nov 13 04:46:44 UTC 2023

On Monday, 13 November 2023 at 04:46:07 UTC, matheus wrote:
> ...

Part 2:

AST nodes in dmd-as-a-library

Since DConf, Razvan had been considering how dmd-as-a-library 
could offer the possibility to override any kind of AST.

He gave us this example of the expression class hierarchy used by 
the AST implementation:

module expression;
import std.stdio;

class Expression {}

class BinExp : Expression
{
     void fun()
     {
         writeln("expression.fun");
         BinExp p = new BinExp();
         Expression e = new Expression();
     }
}

class BinAssignExp : BinExp
{}

class AddAssignExp : BinAssignExp
{
     override void fun()
     {
         writeln("AddAssignExp");
     }
}

class MulAssignExp : BinAssignExp
{}

Then in ast_family.d, the default ASTCodegen looks like this:

struct ASTCodegen
{
     public import expression;
}

To create a custom AST and override the default behavior of e.g., 
BinExp, you need to do something like this:

struct MyAST
{
     import expression;
     import std.stdio;

     alias Expression = expression.Expression;
     class MyBinExp : expression.BinExp
     {
         override void fun()
         {
             writeln("MyBinExp.fun");
         }
     }

     alias BinExp = MyBinExp;
     alias BinAssignExp = ?
}

The problem with this is that you now have to declare all of the 
AST nodes that inherit from BinExp so that they use your custom 
implementation. This is not a workable solution. We need the 
ability to specify not only that we're overriding a particular 
node, but that other nodes need to use it.

First, he thought about templating the AST nodes and inheriting 
from the templated version, but that means heavily modifying the 
compiler. Then he came up with a solution using mixins. You just 
mixin the code of the AST node that you want.

With this approach, the AST nodes are now mixin templates:

module expression_mixins;
import std.stdio;

mixin template Epression_code()
{
     class Expression {}
}

mixin template BinExp_code()
{
     class BinExp : Expression
     {
         void fun()
         {
             writeln("expression.fun");
             BinExp p = new BinExp();
             Expression e = new Expression();
         }
     }
}

mixin template BinAssignExp_code()
{
     class BinAssignExp : BinExp
     {}
}

mixin template AddAssignExp_code()
{
     class AddAssignExp : BinAssignExp
     {
         override void fun()
         {
             writeln("AddAssignExp");
         }
     }
}

mixin template MulAssignExp_code()
{
     class MulAssignExp : BinAssignExp
     {}
}

And then the expression module becomes:

module expression;
import expression_mixins;
import std.stdio;

mixin Expression_code();
mixin BinExp_code();
mixin BinAssignExp_code();
mixin AddAssignExp_code();
mixin MulAssignExp_code();

In ast_family, ASTCodegen remains the same, but now you can do 
this for your custom AST:

struct MyAst
{
     import expression_mixins;
     import std.stdio;

     mixin Expression_code();

     mixin BinExp_code() t;
     class MyBinExp : t.BinExp
     {
         override void fun()
         {
             writeln("MyBinExp.fun");
         }
     }

     alias BinExp = MyBinExp;
     mixin BinAssignExp_code();
     mixin AddAssignExp_code();
     mixin MulAssignExp_code();
}

We could have something in the frontend library to generate the 
boilerplate automatically. But the main thing is that now you can 
replace any default node in the hierarchy with your custom 
implementation without needing to redeclare everything. In this 
example, everything that inherits from BinExp is now going to 
inherit from MyBinExp instead. This works. He showed a runnable 
example.

He doesn't think this is that ugly. And for what it gives us, 
basically a pluggable AST, any perceived ugliness is worth it. He 
said it would be great if we could reach a consensus on how to go 
forward.

Átila said he liked it.

Razvan noted that a problem is that the semantic routine visitors 
aren't going to work anymore. But the cool thing is you can also 
put those in mixins. You mix those in with your custom AST, you 
inherit from and override whatever visiting nodes you want, and 
you get all of the functionality you need.

Timon said his main concern with this kind of scheme is that he 
has tried them in the past, and usually dmd dies when it tries to 
build itself. He thinks the current version of dmd will choke on 
this at some point. It always appears to work at the start, but 
if you scale it up enough, random stuff starts to break, like an 
"undefined identifier string", a general forward reference error, 
or an ICE, etc.

Razvan said he had encountered the undefined identifier thing, 
but you can work around that by inserting an alias in the problem 
spot. Regardless, he argued that any such error is a compiler bug 
that needs to be fixed.

Timon agreed, but his question was how do you navigate that when 
the compiler can't build itself because of a compiler bug? Razvan 
said he'd fix the bug.

Dennis noted that the bootstrap compiler wouldn't have the fix. 
Martin said you'd have to raise the bootstrap compiler version. 
He then said one thing that bothers him is that it works okay as 
long as you have one single AST. However, tools that require more 
than one AST would all need completely different class 
hierarchies. Even if you're interested in just the little BinExp, 
you might affect tens of those classes but not the hundreds of 
classes overall. You might want to share an overridden class 
among the ASTs.

Razvan said he thought it could be done. You'd declare the class 
outside of the AST family and then just use an alias inside your 
family.

Steve said his problem with this was that when you see the 
implementation in the compiler, you look at BinExpression and see 
that AddAssignExpression inherits from it, there's a possibility 
that it might be inheriting from something completely different 
from the one right above it. It's going to be hard to keep track 
of that. Especially if we have an alternate AST e.g., for 
dmd-as-a-library. Keeping track of where things go and what 
things are inheriting from is going to be confusing.

Razvan said if you're working on the compiler, there shouldn't be 
any confusion. What you see in the mixins is the implementation. 
Nothing's going to be overriding it. For users of the compiler 
library, it might be a bit confusing, but still, you're going to 
have to select the AST nodes you want to override. He expects the 
interface is going to be much simpler than all of this 
boilerplate, all the multiple mixins. It can be automated. Then 
you just have to specify which classes you want to override. 
Yeah, it may be a bit more confusing, but the way dmd is 
organized now, he doesn't see how you can just preserve the 
current state of the code and move forward.

Mathias thought it was a pretty terrible API for a very simple 
problem, but it's the only solution he's seen so far that does 
the job. He's used something like this himself in the past. So 
far, it seems to be the only solution we have.

Walter said there's another way. Instead of putting the mixins 
outside the AST nodes, put them inside. Have the AST nodes mixin 
a user-defined template. That way, features can be added in 
without turning the whole thing inside out.

Razvan said he'd thought about that also, but then you'd still 
have to define all the definitions. Walter said they could just 
be blanks in the main compiler. Razvan agreed, but users of the 
compiler library would still have to define all the AST nodes and 
then manually mixin the ones they're interested in. Relating to 
Steve's comment, this would be a much uglier approach. Because 
right now, looking at the expression implementation, the first 
instinct is that it's ugly. But it's very nicely encapsulated. 
You have the entire class there and you just plug it in your AST 
family.

He said that this has been a problem for so many years and we 
didn't have a solution for it. And this works. The good part is 
that you can implement it incrementally. You can take each AST 
node incrementally, put it in a mixin, and then just insert it in 
the AST code and see if it works. And if you have any problems 
because of compiler bugs, it's going to be super easy to track. 
Yes, it's going to require massive changes in the compiler, but 
right now, if you don't modify all of the AST nodes, it's 
impossible to override them.

Walter said he didn't understand why putting the mixins inside 
the classes rather than outside them would not also accomplish 
the same thing, but be minimally disruptive. Átila said you'd 
have to define all of the child classes. You still have to write 
them up by hand. Walter didn't think that was true. Razvan said 
when you define your AST, you'd still have to write class 
Expression and then mixin the contents. Walter said you redefine 
what you're mixing in. Steve said you can pass all the things you 
want to mixin to the class. And then it uses that to mixin the 
code.

Walter said in each semantic node you have, you list the 
inheritance and things like that, and then you mix in the 
features template. The features template would be different for 
the compiler as opposed to the user's library. Then if you want 
to modify an AST node, you just modify the mixin for that AST 
node. And that's all you'd have to do.

Razvan asked what would happen when he wanted to add another 
field or another function. Pass them as a parameter? Átila said 
that wouldn't work because you'd have to pass in as many mixins 
as you have AST nodes in the API for this, or you only pass one 
and it gets mixed into every AST node.

Walter suggested another method. We'd removed the semantic 
routines from the AST and now they're done separately. We could 
take that further and remove more of the overriding functions and 
all that. Have the AST tree look like the one in ASTBase, i.e., 
it's just the hierarchy and not much else. Then the user can 
modify that. It's just a thought that might be worth exploring. 
Just stip out all of the AST stuff that's specific to the 
compiler so it's more of a bare AST that the user can use without 
needing to modify it.

Razvan said we had this approach four years ago, but at some 
point, we just started pulling out stuff, and at that point, 
Mathias had said it wasn't obvious where this was leading us. 
Even if we did that, Razvan thinks it's orthogonal because you 
still won't have the possibility to add new fields to the AST 
unless you also use it in conjunction with a mixin solution.

Walter agreed it doesn't add new fields. Átila asked if you'd 
need to. Wouldn't it just be that you'd write your own visitor? 
Why would you need to modify the AST at that point?

Razvan: Because the AST is currently being used by the semantic 
routines. The parser expects a specific AST structure. So if you 
want to add some fields to just store some info for use during 
semantic analysis...

Timon said having mixed-in features per class works because you 
can do static if based on the type of this to see in which node 
you are, and you can inject your code exactly into the right 
node. But this is exactly what his experimental compiler frontend 
hobby project was doing and it completely broke with dmd 61.

Dennis said he wasn't sure yet what application of 
dmd-as-a-library really requires you to override classes. Instead 
of trying to statically enhance the classes, what if every AST 
node had either an identifier or just an extra void pointer for 
library things? A library could then dynamically cast and read 
that field without heavy template and mixin machinery.

Razvan cited an example from the unused import tool. There, when 
the compiler does name resolution, it has a search method in the 
scope of the symbol class. He'd just need to override that method 
and do something a little bit extra, like store some information 
or do something a bit different.

Here the conversation veered off a bit into the use of 
virtual/final functions in dmd, how things would need to change 
for dmd-as-a-library, and the possibility of breaking user code 
with API changes. Razvan noted Luís Ferreira was excited about 
the possibility of modifying AST nodes for a personal project of 
his that might end up being used by Weka. Right now, he has to 
jump through a lot of hoops because we don't provide a proper 
interface.

Razvan said he's willing to work on this and make some PRs to dmd 
to see if this can work or if he's just hitting roadblocks. He 
asked if Walter would approve of exploring it or thought it was 
just too ugly. He said of course Walter could explore the other 
solution, too. But it looks like it's either this one or that 
one. We don't have too many options.

Átila repeated he liked it. Walter asked how other compilers do 
it. That brought up some talk about clang and the Java compiler.

Martin said he wouldn't worry too much about breaking the 
compiler's API for now. We're at the complete starting point of 
being able to use it as a library. This is the first baby step. 
So we shouldn't worry about huge changes in the API. Once we have 
stable tools depending on it, that's a different story. LLVM has 
breaking API changes. Sometimes they even transition over three 
versions with different deprecation steps. He doesn't see a 
problem with D doing the same.

Walter was concerned this is a wrenching change to the compiler 
internals and isn't sure it's worth it. There are a lot of AST 
nodes in the compiler, and rewriting them all...

Razvan said that for him, this isn't any different than when we 
templated the parser or when we moved out semantic routines. 
Walter said templating the parser kind of failed. Razvan said 
that's because we didn't follow up with semantic. We templated 
the parser, but libdparse already existed, so we weren't offering 
anything new. There are a bunch of libraries out there doing bits 
of what dmd is already doing, but this is kind of a different 
scale.

Walter said he understands what it does, he's just concerned 
about it. He didn't understand what people wanted to do with 
dmd-as-a-library. He wasn't sure about what capabilities were 
needed. He was also concerned that anyone wanting to use 
dmd-as-a-library would have to learn too much about the compiler 
to use it effectively. He kind of likes the void* approach where 
a hook can be added without the compiler knowing anything about 
it. He was just reluctant to endorse something he didn't 
understand. And that's a large change in the compiler.

Timon asked why we didn't just template the semantic analysis on 
the AST family and then let the library go wild with having its 
own hookable AST. That shouldn't interfere with the standard 
compilation of the compiler. You wouldn't edit the existing AST 
nodes.

Walter said that if the compiler is properly modularized, it 
isn't necessary to template this stuff. You just import a 
different module for different behaviors if things are properly 
encapsulated. One thing we might look at is seeing if we can 
reduce the number of imports the AST nodes do. Try to make them 
more pluggable. So instead of using templates and all of that 
stuff, just import a different module with your own feature in 
it. There are lots of ways to do it.

Timon asked how you'd tell the existing module to please import 
my module instead of the existing one. That would still need 
templating. Walter said you'd do that by having your module in a 
different directory and using the -I switch to point the compiler 
to it. Then you've got a different implementation.

Razvan said that's not a library anymore. You'd have to replace 
the existing files with your files. That doesn't solve the issue.

Walter: So what you're saying is it should be a compiled 
dmd-as-a-library or a source code dmd-as-a-library.

Razvan said yes. If you're modifying the files, you might as well 
just be using a fork of the compiler. What's the point? You want 
to stay up to date with the latest compiler and you want to have 
all the features there and just plug in what you want to use.

Adam said you don't modify the files, you replace them entirely 
in the build system. But that is a pain. Razvan said he wanted to 
reuse as much as he could of the existing code, so a fair amount 
of copy-paste would be required for a specific file.

Walter said he understood what Razvan was saying, but if we 
better encapsulate the modules, that's much less of a problem to 
stay up to date with the rest of the compiler. For example, he'd 
finally gotten the lexer and parser to not import the rest of the 
compiler. Now somebody can plug in a different lexer and a 
different parser and then use the rest of the compiler 
unmodified. He didn't see any fundamental reason why it couldn't 
be done with AST nodes. Just import a different statement.d file 
if you want to change the statement nodes. If it's properly 
encapsulated, then it's easy to just replace the compiler's 
statement.d with the user's version. And they can tweak it as 
they see fit.

Timon said the fundamental problem to be solved is how to make 
semantic reusable on many different kinds of AST nodes. He wasn't 
convinced that replacing D files in the build system was a good 
interface for that. Maybe there was some way to marry the two 
approaches that wasn't too devastating for either of them.

Walter said there had been some discussion about replacing 
build.d with a single command to dmd that just simply reads all 
the files in the directory and compiles them all together. He 
thought that was an interesting idea and that it would make it 
much easier to build things out of the dmd source. He agreed that 
build.d was overly complex for what it does, which should be just 
compiling the files. Why is it this massive, complicated thing?

Átila said what people ideally would like to do is just type in a 
dub.json/sdl file that dmd is a dependency and it just works. 
Walter agreed.

Martin said this was exactly the same discussion as the one we'd 
had before about putting linter stuff in the frontend. Ditto for 
the decision to keep the check of unreachable code in the 
compiler just so tools can use it. Replacing the D files wasn't 
going to cut it. We were going to need something in between. 
There will be a fork of the compiler as a base for 
dmd-as-a-library. So we could do there the mixins, void*, or 
whatever so that it can be used. It's a somewhat opened-up 
interface to the frontend with some slight modifications. So you 
could add modifications like void*, or add some extra state to 
all of the AST nodes, extra visitors, or whatever extra 
functionality we need which would cost us and which we don't want 
to see in the frontend.

For example, if we have to remove the final attribute from some 
of the performance-critical methods just to be able to use the 
frontend as it is for arbitrary tools, we could just do it in the 
fork. That would be a viable approach to have dmd-as-a-library as 
a separate project on GitHub. We can update with every major or 
minor version and tools could build on top of that. Then we don't 
have to add all the extra stuff in the compiler itself and make 
it slower or uglier in the process. That's similar to how LDC 
uses dmd as a frontend. They rewrite the history of the monorepo, 
exclude some stuff, move some files to other places so they can 
use them more conveniently, and have some little adaptations like 
extra fields in some places, replace functions, etc. It works. 
Keeping up with the latest dmd changes isn't too bad. If that was 
dealt with by the community team, similar to how we maintain dub, 
we should be good.

Walter said he was glad Martin had brought that up. He suddenly 
realized Martin and Iain were already using dmd as a library, so 
their experience would be extremely informative on how to better 
do this. And maybe by better supporting them, we'd be implicitly 
better supporting dmd as a library. He didn't know how they were 
integrating it into LDC and GDC. He said he was sorry he hadn't 
thought of that before.

Martin said LDC and GDC were obviously modifying some parts of D, 
at least LDC was, but just very few occasions. With the rest of 
the interface, they were interoperating with C++. That's a 
special case for them and not comparable to other D tools that 
would use dmd-as-a-library directly. He'd said before that it was 
all working quite nicely. You couldn't do anything much better to 
simplify his life in this regard. Some people had said they'd 
like to see LDC modifications upstreamed, but he didn't share 
that view; he wasn't proud of having all of those special case 
additions in LDC that they do for the frontend editions. They 
were probably added 10 years ago by Kai or someone. They're ugly, 
but they're working. He wouldn't want to see those upstreamed 
into dmd, nor all the LDC intrinsics upstreamed into DRuntime.

Mathias said those aren't special cases, they're use cases for 
dmd-as-a-library. Martin said yes, but he didn't see the 
connection. Mathias said they're good examples of the kinds of 
features people would need for dmd-as-a-library. Currently, 
they're hacks because you put them in an LDC version block and 
modify the code. But by upstreaming them, you'd be turning the 
hacks into proper configuration points, which is the point.

Martin agreed. His point was he didn't want to uglify the code 
and reduce the readability of the runtime and the compiler's 
source. The same thing applies to the linters or these planned 
extension points. So we could keep the frontend as it is and have 
that intermediate dmd-as-a-library project with some 
modifications to simplify some of the hooks, add extra state, and 
maybe make the whole AST replacement if we need to replace that 
functionality.

He continued by reiterating that Walter's suggestion of replacing 
D files is obviously not what you want. You want to reuse most of 
the functionality. Like the example Razvan brought up with the 
unused imports. You just need to replace one little search 
function. So having dmd-as-a-library as its own little project to 
start hacking from is quite fine. And we can see how it evolves.

There was a side discussion about how the extern(C++) interface 
affects dmd-as-a-library. Then I called on Steve, who had raised 
his hand sometime before, but I suggested we should wrap this 
discussion up soon, as it didn't appear there was going to be a 
clear point where we could do that.

Steven said this was very similar to the discussions we'd had 
about a new Phobos version, i.e., reusing code that is compiled 
in a completely different way but is the same code and we don't 
want to make copies of it. We should think about the experience 
from that and how we didn't end up with a good solution. Átila 
agreed.

I said we should do what Razvan had suggested and take this 
offline. I asked Razvan to start an email with everyone who might 
have feedback CC'ed. And maybe we could establish a base from 
which to work in a future monthly meeting or planning session. 
Everyone agreed.

(UPDATE: There was some progress on this. Some separate meetings 
focused on dmd-as-a-library were held in October. I did not 
participate in these, but I'll post an update on what transpired 
once I've caught up on the other summaries.)