Writing a compiler in CTFE
MysteryMan
YouCanContactMe at Mystery.esp
Sun Jul 1 13:51:36 UTC 2018
I would like to create a compiler, I like D but I hate it! I want
to migrate to a new compiler, possibly a personal compiler where
I can easily customize and tweak until my hearts content.
For speed of development, instead of having to compile a compiler
that then compiles the program I figured using D's CTFE and
import could work. For a monolithic compiler file import is used.
This is all easily within D's grasp.
The process is as follows:
We write dmd code utilizing all the power of the D language, but
minimize complexity since it is for bootstrapping only and
ideally exists only at version 0 of the compiler that will parse
our new language grammar from which we built our new compiler in
it's own language.
We can break the process up in to 5 stages
[Our new compiler's source code written in it's own language]
->
[D source code that compiles sources in our new language at CTFE]
->
[DMD] ->
[Have the binary run on the source code from stage 1]
After these steps have been done one has a binary that is the
boot strap compiler that can be used as the "core" compiler for
the new language. It takes the core language, which should be
minimally specified to avoid complexity, bugs, etc but completely
expressible.
To get the next version of the compiler away from dmd one must
then alter the source code to supply the new binary code
generators that we used in stage 2. This is a lot of work as all
semantics must be remapped from the dmd design to the new
languages design.
This last stage is where all the thought must be put in so we can
minimize design time.
So, we start with a well specified but arbitrary programming
language that has symbols and semantics for those symbols.
For example, we have the tiny super compiler which is written in
javascript:
https://github.com/jamiebuilds/the-super-tiny-compiler/blob/master/the-super-tiny-compiler.js
To make life more interesting, just assume this is done in D.
This could be our input to the dmd's CTFE engine in which we
would have to have a D parser than can parse the source code(maps
D constructs to D constructs so this is very easy, in fact, we
can just `mixin` the code directly. Imagine a mixinjs which mixes
in js source code which was converted to D, a bit more
complicated but still doable)
What's interesting about this method is that one can
always(assuming no broken compatibilities) use D to generate a
new bootstrap and also use the last version to boot strap itself.
The boot strapped compiler automatically has all the features
that dmd has such as all the architectures are available(does
require recompiling the boot strap with the new dmd args).
What's more, is if we already had a ctfe compiler for our
language, we could use it inside any D program, has I've already
showed with mixinjs, we could have an mixin(import!(js)(file))
which converts the js code to D code and mixes it in directly.
Some plumbing may be required but it would allow us to not only
import d code in to d but other languages(that can be
representable in D easily).
For example, suppose we had a C to D compiler in the above sense.
import!C(C_file) will take any c file and map the source to d
source(most of the syntax is identical so it is an easy mapping).
Some work is require, for example, It would have to map #import
X; statements to import!C(X);. With some plumbing work we can use
any C code with D.
Such a concept would be very powerful indeed! But to be able to
accomplish this in a general way as to provide this technique we
need a very general way to specify a compiler framework in D(that
works in ctfe for rapid production) that makes it easy to
represent most popular languages.
Most of the work is in translating one grammar to the other, and
therefor, this new framework must be able to make translation
easy.
E.g., the for loop in C is identical to the for loop in D so a
direct mapping can be used. In matlab code the for loop looks
like for i = 1:10. This is just a rearrangement of the for loop
in C, for the most part so it too has a direct mapping.
The best I can understand it is that we have our input language
input grammar and we want to map it to the D language grammar.
Hence we have a mapping between grammars.
This is a very complex issue because of several corner cases.
What I am proposing here is for discourse on ways to express this
problem for to maximize expressivity while minimizing effort(the
good old min/max problem we all know and love).
I will start by expressing my two current positions on this
problem:
One of the first problems is to settle on terminology and discuss
the pathological issues that exist.
More information about the Digitalmars-d
mailing list