How can we make it easier to experiment with the compiler?

Mon May 24 02:25:33 UTC 2021

On Sunday, 23 May 2021 at 06:12:30 UTC, Ola Fosheim Grøstad wrote:
> I think there are many that would like to experiment with the 
> compiler, but feel discouraged because they don't know how to 
> approach it.
>
> I think this is not only comes down to documentation, but also 
> is structural. In order to figure out what to improve, the best 
> starting point is experienced challenges.
>
> The number one challenge I see is keeping track of DMD as it is 
> released with new improvements. Basically reapplying the 
> changes made to the experimental branch to the main branch (aka 
> "rebasing"?).

(the is the correct terminology). I suspect this is more of a 
problem for people that are less familiar with git, which might 
well also include people wanting to play around with DMD, e.g. 
GSoC/SAoC students.
I know this was the case for me while developing dcompute with 
the added difficulty of tracking LLVM on top of LDC (which was 
kept in sync with DMD).

> I suspect that kills many efforts, meaning people create a 
> fork, start making changes, but then a new version of DMD is 
> released and the fork is left to dry in the sun as rebasing is 
> not fun. And well, a hobby that isn't fun, is not a good hobby. 
> :-D

The solution to this is better git skills not so much better 
compiler skills/knowledge of DMD although a merge conflict in a 
critical piece of code is always a PiTA. We now have 
slack/discord for people to ask these kinds of questions, which 
I'm sure they will get answered if the are trying to do something 
interesting or fix an annoying problem.

> Better internal compiler structure would help a lot with this. 
> So a prioritized list for me would be:

Oh god yes. the directory structure, or rather lack thereof, is a 
really dire repellant for newcomers. I cannot understate this. 
173 files in dmd/src/dmd is _completely_ unacceptable, however 
Walter seems to like it this way and has struck down PRs trying 
to remediate this in the past (because it doesn't suit his editor 
configuration? or something like that).

We should have at least the following folders:
ast: ast_node, dsymbol, aggregate, et al
semantic: semantic2, semantic3, ob, nogc, safe et al
visitors: parsetimevisitor, permissivevisitor, visitor et al
glue (backend interfacing files): lib[.*],scan[.*] toir, s2ir, 
e2ir et al
lex: lexer, tokens, identifier, id  utf et al
headers: (alas still needed until dtoh works well enough and has 
been stable enough releases for GDC to bootstrap)

> 1. Have a clean separation between frontend and backend, that 
> is close to plug-and-play. That would allow people to inject a 
> new high level IR between frontend and backend that could open 
> for new interesting optimizations, and allow all the compilers 
> to benefit from it.

see also https://mlir.llvm.org, I had a GSoC student try to do 
something with this, I don't think it got to a usable state. but 
this is about as a state of the art as it gets and a very 
interesting research direction. Rust and swift use multiple 
levels of IRs.

Also from what I understand, the pointer and liveness analysis as 
part of DIP 1000/1040/(other walter DIPs?) does something like 
this, but in a hacked up, nonstandard manner.

> 2. Break down source files into smaller units, so that stable 
> parts are separated from unstable parts.

Urgh. Dealing with 10000 line files and 1000 line functions is 
such a drain on trying to get stuff done (looking at you 
expressionsem.d). However this needs to be combined with 
directories/packages or it will not improve the situation.

> 3. More encapsulation and separation of responsibility.
>
> 4. Switch to a more syntactical AST, possibly enabling AST 
> macros in the future without too much hassle, then use an IR 
> for real work.

That is a noble goal, but would require _a lot_ of changes both 
in DMD and in downstream LDC and GDC, and tools that consume AST 
that expect it to be complete. not to mention designing said IR, 
redoing semantic analysis/transformations to work with it.

> 5. Use directories.

Yes!!! sooo much yes! see above.

> 6. Improved documentation.
>
> 7. Tutorials.
>
> What other items should be on the list?

try to make sure we use standard terminology for things so that 
people can reliably search for things

> Which items are feasible in the next 6 months?

Directories.