PROPOSAL: dsimpl
Russell Lewis
webmaster at villagersonline.com
Fri Feb 22 09:05:14 PST 2008
I have been thinking recently that it would make a lot of sense for us
to develop an "D simplifier." This is a tool which would take complex D
code, with all the cutting edge features, perform syntax checking, and
then simplify it, and produce D code which is based on a vastly reduced
feature set.
(I argue that it should produce D code, instead of C, because D has many
structural features, like dynamic arrays, garbage collection, modules,
newfangled-const, etc. which would be impractical to express in C.)
Originally, Walter described D as a language that was easy to parse and
easy to write a compiler for. While he has kept the language syntax
pretty easy to parse, I think we can all agree that it has developed,
over time, into a monster of complexity. There's a reason why GDC can't
keep up with DMD: it's hard.
To make things worse, D has evolved over time to include very complex
and (sometimes) poorly-specified features. IFTI (implicit function
template instantiation) is a classic example: you write a template,
close your eyes, and compile. It's a crap shoot whether the compiler
will deduce what seems obvious to you or if it will fail.
Let me say right here, though, I don't mean this as a criticism of
Walter: IFTI, while a compelling and (IMHO) necessary feature, is
mind-bogglingly hard to get right. To write a compiler which could
deduce everything that a programmer might deduce is a Herculean task.
So I ask anybody: do you think you could write a compiler which deduced
everything that DMD deduced? Even if you could, how would you verify
that you did? If *anyone* tried to write a compiler of their own, the
inevitable consequence would be to fork the language into templates
which DMD can deduce, and templates which the other compiler can deduce.
Then consider CTFE (compile time function evaluation). This seems
better specified...but would you say it would be "easy" to write a
compiler that included this feature?
Because of this, we have, effectively, only a single compiler in the
ecosystem. This is how it stays until we change something.
The key insight I had was that the vast majority of the features that
make a D compiler complex (like IFTI, CTFE, mixins, etc.) are not things
which *structurally* change the program; they are deductions and
calculations carried out by the front-end.
So, my proposal is that we start an open-source project which is a "D
simplifier." This tool takes really complex D code, and performs
transformations on it...but only performs transformations that are
clear, always-valid, semantics-retaining transformations. That is, it
performs only the sorts of transformations that *any* D compiler will
need to make. The output is vastly simplified D code.
Then, once it is working, we hound Walter to make future language
changes (whenever possible) in this "dsimpl" tool, rather than making
private changes in DMD.
Of course, DMD (and all other compilers) will need to add new features
over time. Some features (like const) really need to be known by the
back-end D compiler. But most D features can be rewritten in a simpler
form, and dsimpl is the standard, shared tool which should accomplish
this rewriting.
My argument is that it would be easy - or at least much easier - to
write a D compiler which compiles that restricted subset. This would
make it practical to have an ecosystem with many different D compilers,
all of which reflect the same D standard.
This tool also makes it practical for people to contribute to language
development (when IFTI doesn't deduce your template, you can write the
code to make it work, and contribute it to the project) or to experiment
with new features.
THE COUNTER-ARGUMENT, AND WHY IT REALLY IS THE PRO-ARGUMENT
The obvious counter-argument to this is that any "dsimpl" tool, as I
envision it, would be very complex. But that is, of course, exactly the
reason why we *MUST* have this tool. If the translation tool is
complex, then building a compiler that does the translation *and* then
compiles the result is *VERY* complex.
IMHO, the only reason to *not* have a common, open-source, standard
translation tool is if translation was easy!
SPECIFIC DESIGN
IMHO, if you are going to build an open-source parser for the D language
(which, of course, is the first step in writing a compiler/simplifier),
you should package the parser as a library. So I would split the
project into three parts:
* A parser library. The objects include toString() functions to produce
the D code that they represent. But the library could also be used
by other tools.
* A transformation program, which transformed the parsed code into a
new form. 3rd party compilers might pick up the code at this point
and compile it immediately.
* Code which calls toString() on the transformed code to output the
simplified code as D
THINGS I THINK A SIMPLIFIER SHOULD DO
My operating theory is that the simplifier should automatically do
things which reduce the complexity of the language. But it should limit
itself to things the implementation of which is clear and likely to be
acceptable to all compiler writers. If we do a questionable
transformation, and as a consequence complier X says "I can't use the
simplifier, it does the wrong thing," then we haven't really
accomplished what we need to. Here's my proposed (and incomplete) list
of things it could do:
* IFTI (make all templates explicit)
* CTFE (replace all compile time function calls with literals)
* Handle mixins
* Replace all "auto" variables (type deduction) with explicit types
* Make all calls to "typesafe variadic functions" (functions taking
variadic arrays) explicit - that is, manually build the array
argument which you will pass
* Resolve version() and debug() blocks
* Expand all aliases to the underlying types
* Convert foreach() statements over arrays into for() statements
* Convert foreach() statements over other types into delegate literals
and call them in the correct way
* Convert all operator overloading into member function calls
* Rewrite all integer & string literals in a standard form, so the
backend parser only has to parse one format for each type.
* ...many more things...
In addition, I think we could add a set of optional transformations, to
be used only by people who wanted them. Such as:
* Collapse all code (except library code) into a single source file
(necessary for some of the optimizations below). This would involve
name mangling of the symbols.
* Expand all templates into non-template entities (again, requiring
name mangling) and then remove the templates
* Declare structs for all delegate literals, with the literal converted
to a member function. Use the struct explicitly. (This removes any
need to support delegate literals in the simplified language.)
* When closures are required, allocate the struct on the heap
instead of on the stack.
* Inlining
* Dead code elimination
* Expression simplification & optimization
* Advanced whole-program analysis & optimization
WHO STARTS THIS?
Unfortunately, I need approval from my management chain before I start
any open source project, and it's very hard to get. So I can't start
this project. But, it's much easier for me to get approval to join a
project that is already ongoing. So this email is my proposal; if other
people think that this is a good idea, they'll have to get it well
started. But then I will join later. :)
Thoughts?
Russ
More information about the Digitalmars-d
mailing list