DLF September 2023 Monthly Meeting Summary
matheus
matheus at gmail.com
Mon Nov 13 04:46:44 UTC 2023
On Monday, 13 November 2023 at 04:46:07 UTC, matheus wrote:
> ...
Part 2:
AST nodes in dmd-as-a-library
Since DConf, Razvan had been considering how dmd-as-a-library
could offer the possibility to override any kind of AST.
He gave us this example of the expression class hierarchy used by
the AST implementation:
module expression;
import std.stdio;
class Expression {}
class BinExp : Expression
{
void fun()
{
writeln("expression.fun");
BinExp p = new BinExp();
Expression e = new Expression();
}
}
class BinAssignExp : BinExp
{}
class AddAssignExp : BinAssignExp
{
override void fun()
{
writeln("AddAssignExp");
}
}
class MulAssignExp : BinAssignExp
{}
Then in ast_family.d, the default ASTCodegen looks like this:
struct ASTCodegen
{
public import expression;
}
To create a custom AST and override the default behavior of e.g.,
BinExp, you need to do something like this:
struct MyAST
{
import expression;
import std.stdio;
alias Expression = expression.Expression;
class MyBinExp : expression.BinExp
{
override void fun()
{
writeln("MyBinExp.fun");
}
}
alias BinExp = MyBinExp;
alias BinAssignExp = ?
}
The problem with this is that you now have to declare all of the
AST nodes that inherit from BinExp so that they use your custom
implementation. This is not a workable solution. We need the
ability to specify not only that we're overriding a particular
node, but that other nodes need to use it.
First, he thought about templating the AST nodes and inheriting
from the templated version, but that means heavily modifying the
compiler. Then he came up with a solution using mixins. You just
mixin the code of the AST node that you want.
With this approach, the AST nodes are now mixin templates:
module expression_mixins;
import std.stdio;
mixin template Epression_code()
{
class Expression {}
}
mixin template BinExp_code()
{
class BinExp : Expression
{
void fun()
{
writeln("expression.fun");
BinExp p = new BinExp();
Expression e = new Expression();
}
}
}
mixin template BinAssignExp_code()
{
class BinAssignExp : BinExp
{}
}
mixin template AddAssignExp_code()
{
class AddAssignExp : BinAssignExp
{
override void fun()
{
writeln("AddAssignExp");
}
}
}
mixin template MulAssignExp_code()
{
class MulAssignExp : BinAssignExp
{}
}
And then the expression module becomes:
module expression;
import expression_mixins;
import std.stdio;
mixin Expression_code();
mixin BinExp_code();
mixin BinAssignExp_code();
mixin AddAssignExp_code();
mixin MulAssignExp_code();
In ast_family, ASTCodegen remains the same, but now you can do
this for your custom AST:
struct MyAst
{
import expression_mixins;
import std.stdio;
mixin Expression_code();
mixin BinExp_code() t;
class MyBinExp : t.BinExp
{
override void fun()
{
writeln("MyBinExp.fun");
}
}
alias BinExp = MyBinExp;
mixin BinAssignExp_code();
mixin AddAssignExp_code();
mixin MulAssignExp_code();
}
We could have something in the frontend library to generate the
boilerplate automatically. But the main thing is that now you can
replace any default node in the hierarchy with your custom
implementation without needing to redeclare everything. In this
example, everything that inherits from BinExp is now going to
inherit from MyBinExp instead. This works. He showed a runnable
example.
He doesn't think this is that ugly. And for what it gives us,
basically a pluggable AST, any perceived ugliness is worth it. He
said it would be great if we could reach a consensus on how to go
forward.
Átila said he liked it.
Razvan noted that a problem is that the semantic routine visitors
aren't going to work anymore. But the cool thing is you can also
put those in mixins. You mix those in with your custom AST, you
inherit from and override whatever visiting nodes you want, and
you get all of the functionality you need.
Timon said his main concern with this kind of scheme is that he
has tried them in the past, and usually dmd dies when it tries to
build itself. He thinks the current version of dmd will choke on
this at some point. It always appears to work at the start, but
if you scale it up enough, random stuff starts to break, like an
"undefined identifier string", a general forward reference error,
or an ICE, etc.
Razvan said he had encountered the undefined identifier thing,
but you can work around that by inserting an alias in the problem
spot. Regardless, he argued that any such error is a compiler bug
that needs to be fixed.
Timon agreed, but his question was how do you navigate that when
the compiler can't build itself because of a compiler bug? Razvan
said he'd fix the bug.
Dennis noted that the bootstrap compiler wouldn't have the fix.
Martin said you'd have to raise the bootstrap compiler version.
He then said one thing that bothers him is that it works okay as
long as you have one single AST. However, tools that require more
than one AST would all need completely different class
hierarchies. Even if you're interested in just the little BinExp,
you might affect tens of those classes but not the hundreds of
classes overall. You might want to share an overridden class
among the ASTs.
Razvan said he thought it could be done. You'd declare the class
outside of the AST family and then just use an alias inside your
family.
Steve said his problem with this was that when you see the
implementation in the compiler, you look at BinExpression and see
that AddAssignExpression inherits from it, there's a possibility
that it might be inheriting from something completely different
from the one right above it. It's going to be hard to keep track
of that. Especially if we have an alternate AST e.g., for
dmd-as-a-library. Keeping track of where things go and what
things are inheriting from is going to be confusing.
Razvan said if you're working on the compiler, there shouldn't be
any confusion. What you see in the mixins is the implementation.
Nothing's going to be overriding it. For users of the compiler
library, it might be a bit confusing, but still, you're going to
have to select the AST nodes you want to override. He expects the
interface is going to be much simpler than all of this
boilerplate, all the multiple mixins. It can be automated. Then
you just have to specify which classes you want to override.
Yeah, it may be a bit more confusing, but the way dmd is
organized now, he doesn't see how you can just preserve the
current state of the code and move forward.
Mathias thought it was a pretty terrible API for a very simple
problem, but it's the only solution he's seen so far that does
the job. He's used something like this himself in the past. So
far, it seems to be the only solution we have.
Walter said there's another way. Instead of putting the mixins
outside the AST nodes, put them inside. Have the AST nodes mixin
a user-defined template. That way, features can be added in
without turning the whole thing inside out.
Razvan said he'd thought about that also, but then you'd still
have to define all the definitions. Walter said they could just
be blanks in the main compiler. Razvan agreed, but users of the
compiler library would still have to define all the AST nodes and
then manually mixin the ones they're interested in. Relating to
Steve's comment, this would be a much uglier approach. Because
right now, looking at the expression implementation, the first
instinct is that it's ugly. But it's very nicely encapsulated.
You have the entire class there and you just plug it in your AST
family.
He said that this has been a problem for so many years and we
didn't have a solution for it. And this works. The good part is
that you can implement it incrementally. You can take each AST
node incrementally, put it in a mixin, and then just insert it in
the AST code and see if it works. And if you have any problems
because of compiler bugs, it's going to be super easy to track.
Yes, it's going to require massive changes in the compiler, but
right now, if you don't modify all of the AST nodes, it's
impossible to override them.
Walter said he didn't understand why putting the mixins inside
the classes rather than outside them would not also accomplish
the same thing, but be minimally disruptive. Átila said you'd
have to define all of the child classes. You still have to write
them up by hand. Walter didn't think that was true. Razvan said
when you define your AST, you'd still have to write class
Expression and then mixin the contents. Walter said you redefine
what you're mixing in. Steve said you can pass all the things you
want to mixin to the class. And then it uses that to mixin the
code.
Walter said in each semantic node you have, you list the
inheritance and things like that, and then you mix in the
features template. The features template would be different for
the compiler as opposed to the user's library. Then if you want
to modify an AST node, you just modify the mixin for that AST
node. And that's all you'd have to do.
Razvan asked what would happen when he wanted to add another
field or another function. Pass them as a parameter? Átila said
that wouldn't work because you'd have to pass in as many mixins
as you have AST nodes in the API for this, or you only pass one
and it gets mixed into every AST node.
Walter suggested another method. We'd removed the semantic
routines from the AST and now they're done separately. We could
take that further and remove more of the overriding functions and
all that. Have the AST tree look like the one in ASTBase, i.e.,
it's just the hierarchy and not much else. Then the user can
modify that. It's just a thought that might be worth exploring.
Just stip out all of the AST stuff that's specific to the
compiler so it's more of a bare AST that the user can use without
needing to modify it.
Razvan said we had this approach four years ago, but at some
point, we just started pulling out stuff, and at that point,
Mathias had said it wasn't obvious where this was leading us.
Even if we did that, Razvan thinks it's orthogonal because you
still won't have the possibility to add new fields to the AST
unless you also use it in conjunction with a mixin solution.
Walter agreed it doesn't add new fields. Átila asked if you'd
need to. Wouldn't it just be that you'd write your own visitor?
Why would you need to modify the AST at that point?
Razvan: Because the AST is currently being used by the semantic
routines. The parser expects a specific AST structure. So if you
want to add some fields to just store some info for use during
semantic analysis...
Timon said having mixed-in features per class works because you
can do static if based on the type of this to see in which node
you are, and you can inject your code exactly into the right
node. But this is exactly what his experimental compiler frontend
hobby project was doing and it completely broke with dmd 61.
Dennis said he wasn't sure yet what application of
dmd-as-a-library really requires you to override classes. Instead
of trying to statically enhance the classes, what if every AST
node had either an identifier or just an extra void pointer for
library things? A library could then dynamically cast and read
that field without heavy template and mixin machinery.
Razvan cited an example from the unused import tool. There, when
the compiler does name resolution, it has a search method in the
scope of the symbol class. He'd just need to override that method
and do something a little bit extra, like store some information
or do something a bit different.
Here the conversation veered off a bit into the use of
virtual/final functions in dmd, how things would need to change
for dmd-as-a-library, and the possibility of breaking user code
with API changes. Razvan noted Luís Ferreira was excited about
the possibility of modifying AST nodes for a personal project of
his that might end up being used by Weka. Right now, he has to
jump through a lot of hoops because we don't provide a proper
interface.
Razvan said he's willing to work on this and make some PRs to dmd
to see if this can work or if he's just hitting roadblocks. He
asked if Walter would approve of exploring it or thought it was
just too ugly. He said of course Walter could explore the other
solution, too. But it looks like it's either this one or that
one. We don't have too many options.
Átila repeated he liked it. Walter asked how other compilers do
it. That brought up some talk about clang and the Java compiler.
Martin said he wouldn't worry too much about breaking the
compiler's API for now. We're at the complete starting point of
being able to use it as a library. This is the first baby step.
So we shouldn't worry about huge changes in the API. Once we have
stable tools depending on it, that's a different story. LLVM has
breaking API changes. Sometimes they even transition over three
versions with different deprecation steps. He doesn't see a
problem with D doing the same.
Walter was concerned this is a wrenching change to the compiler
internals and isn't sure it's worth it. There are a lot of AST
nodes in the compiler, and rewriting them all...
Razvan said that for him, this isn't any different than when we
templated the parser or when we moved out semantic routines.
Walter said templating the parser kind of failed. Razvan said
that's because we didn't follow up with semantic. We templated
the parser, but libdparse already existed, so we weren't offering
anything new. There are a bunch of libraries out there doing bits
of what dmd is already doing, but this is kind of a different
scale.
Walter said he understands what it does, he's just concerned
about it. He didn't understand what people wanted to do with
dmd-as-a-library. He wasn't sure about what capabilities were
needed. He was also concerned that anyone wanting to use
dmd-as-a-library would have to learn too much about the compiler
to use it effectively. He kind of likes the void* approach where
a hook can be added without the compiler knowing anything about
it. He was just reluctant to endorse something he didn't
understand. And that's a large change in the compiler.
Timon asked why we didn't just template the semantic analysis on
the AST family and then let the library go wild with having its
own hookable AST. That shouldn't interfere with the standard
compilation of the compiler. You wouldn't edit the existing AST
nodes.
Walter said that if the compiler is properly modularized, it
isn't necessary to template this stuff. You just import a
different module for different behaviors if things are properly
encapsulated. One thing we might look at is seeing if we can
reduce the number of imports the AST nodes do. Try to make them
more pluggable. So instead of using templates and all of that
stuff, just import a different module with your own feature in
it. There are lots of ways to do it.
Timon asked how you'd tell the existing module to please import
my module instead of the existing one. That would still need
templating. Walter said you'd do that by having your module in a
different directory and using the -I switch to point the compiler
to it. Then you've got a different implementation.
Razvan said that's not a library anymore. You'd have to replace
the existing files with your files. That doesn't solve the issue.
Walter: So what you're saying is it should be a compiled
dmd-as-a-library or a source code dmd-as-a-library.
Razvan said yes. If you're modifying the files, you might as well
just be using a fork of the compiler. What's the point? You want
to stay up to date with the latest compiler and you want to have
all the features there and just plug in what you want to use.
Adam said you don't modify the files, you replace them entirely
in the build system. But that is a pain. Razvan said he wanted to
reuse as much as he could of the existing code, so a fair amount
of copy-paste would be required for a specific file.
Walter said he understood what Razvan was saying, but if we
better encapsulate the modules, that's much less of a problem to
stay up to date with the rest of the compiler. For example, he'd
finally gotten the lexer and parser to not import the rest of the
compiler. Now somebody can plug in a different lexer and a
different parser and then use the rest of the compiler
unmodified. He didn't see any fundamental reason why it couldn't
be done with AST nodes. Just import a different statement.d file
if you want to change the statement nodes. If it's properly
encapsulated, then it's easy to just replace the compiler's
statement.d with the user's version. And they can tweak it as
they see fit.
Timon said the fundamental problem to be solved is how to make
semantic reusable on many different kinds of AST nodes. He wasn't
convinced that replacing D files in the build system was a good
interface for that. Maybe there was some way to marry the two
approaches that wasn't too devastating for either of them.
Walter said there had been some discussion about replacing
build.d with a single command to dmd that just simply reads all
the files in the directory and compiles them all together. He
thought that was an interesting idea and that it would make it
much easier to build things out of the dmd source. He agreed that
build.d was overly complex for what it does, which should be just
compiling the files. Why is it this massive, complicated thing?
Átila said what people ideally would like to do is just type in a
dub.json/sdl file that dmd is a dependency and it just works.
Walter agreed.
Martin said this was exactly the same discussion as the one we'd
had before about putting linter stuff in the frontend. Ditto for
the decision to keep the check of unreachable code in the
compiler just so tools can use it. Replacing the D files wasn't
going to cut it. We were going to need something in between.
There will be a fork of the compiler as a base for
dmd-as-a-library. So we could do there the mixins, void*, or
whatever so that it can be used. It's a somewhat opened-up
interface to the frontend with some slight modifications. So you
could add modifications like void*, or add some extra state to
all of the AST nodes, extra visitors, or whatever extra
functionality we need which would cost us and which we don't want
to see in the frontend.
For example, if we have to remove the final attribute from some
of the performance-critical methods just to be able to use the
frontend as it is for arbitrary tools, we could just do it in the
fork. That would be a viable approach to have dmd-as-a-library as
a separate project on GitHub. We can update with every major or
minor version and tools could build on top of that. Then we don't
have to add all the extra stuff in the compiler itself and make
it slower or uglier in the process. That's similar to how LDC
uses dmd as a frontend. They rewrite the history of the monorepo,
exclude some stuff, move some files to other places so they can
use them more conveniently, and have some little adaptations like
extra fields in some places, replace functions, etc. It works.
Keeping up with the latest dmd changes isn't too bad. If that was
dealt with by the community team, similar to how we maintain dub,
we should be good.
Walter said he was glad Martin had brought that up. He suddenly
realized Martin and Iain were already using dmd as a library, so
their experience would be extremely informative on how to better
do this. And maybe by better supporting them, we'd be implicitly
better supporting dmd as a library. He didn't know how they were
integrating it into LDC and GDC. He said he was sorry he hadn't
thought of that before.
Martin said LDC and GDC were obviously modifying some parts of D,
at least LDC was, but just very few occasions. With the rest of
the interface, they were interoperating with C++. That's a
special case for them and not comparable to other D tools that
would use dmd-as-a-library directly. He'd said before that it was
all working quite nicely. You couldn't do anything much better to
simplify his life in this regard. Some people had said they'd
like to see LDC modifications upstreamed, but he didn't share
that view; he wasn't proud of having all of those special case
additions in LDC that they do for the frontend editions. They
were probably added 10 years ago by Kai or someone. They're ugly,
but they're working. He wouldn't want to see those upstreamed
into dmd, nor all the LDC intrinsics upstreamed into DRuntime.
Mathias said those aren't special cases, they're use cases for
dmd-as-a-library. Martin said yes, but he didn't see the
connection. Mathias said they're good examples of the kinds of
features people would need for dmd-as-a-library. Currently,
they're hacks because you put them in an LDC version block and
modify the code. But by upstreaming them, you'd be turning the
hacks into proper configuration points, which is the point.
Martin agreed. His point was he didn't want to uglify the code
and reduce the readability of the runtime and the compiler's
source. The same thing applies to the linters or these planned
extension points. So we could keep the frontend as it is and have
that intermediate dmd-as-a-library project with some
modifications to simplify some of the hooks, add extra state, and
maybe make the whole AST replacement if we need to replace that
functionality.
He continued by reiterating that Walter's suggestion of replacing
D files is obviously not what you want. You want to reuse most of
the functionality. Like the example Razvan brought up with the
unused imports. You just need to replace one little search
function. So having dmd-as-a-library as its own little project to
start hacking from is quite fine. And we can see how it evolves.
There was a side discussion about how the extern(C++) interface
affects dmd-as-a-library. Then I called on Steve, who had raised
his hand sometime before, but I suggested we should wrap this
discussion up soon, as it didn't appear there was going to be a
clear point where we could do that.
Steven said this was very similar to the discussions we'd had
about a new Phobos version, i.e., reusing code that is compiled
in a completely different way but is the same code and we don't
want to make copies of it. We should think about the experience
from that and how we didn't end up with a good solution. Átila
agreed.
I said we should do what Razvan had suggested and take this
offline. I asked Razvan to start an email with everyone who might
have feedback CC'ed. And maybe we could establish a base from
which to work in a future monthly meeting or planning session.
Everyone agreed.
(UPDATE: There was some progress on this. Some separate meetings
focused on dmd-as-a-library were held in October. I did not
participate in these, but I'll post an update on what transpired
once I've caught up on the other summaries.)
More information about the Digitalmars-d-announce
mailing list