PROPOSAL: dsimpl

Fri Feb 22 09:05:14 PST 2008

I have been thinking recently that it would make a lot of sense for us 
to develop an "D simplifier."  This is a tool which would take complex D 
code, with all the cutting edge features, perform syntax checking, and 
then simplify it, and produce D code which is based on a vastly reduced 
feature set.

(I argue that it should produce D code, instead of C, because D has many 
structural features, like dynamic arrays, garbage collection, modules, 
newfangled-const, etc. which would be impractical to express in C.)

Originally, Walter described D as a language that was easy to parse and 
easy to write a compiler for.  While he has kept the language syntax 
pretty easy to parse, I think we can all agree that it has developed, 
over time, into a monster of complexity.  There's a reason why GDC can't 
keep up with DMD: it's hard.

To make things worse, D has evolved over time to include very complex 
and (sometimes) poorly-specified features.  IFTI (implicit function 
template instantiation) is a classic example: you write a template, 
close your eyes, and compile.  It's a crap shoot whether the compiler 
will deduce what seems obvious to you or if it will fail.

Let me say right here, though, I don't mean this as a criticism of 
Walter: IFTI, while a compelling and (IMHO) necessary feature, is 
mind-bogglingly hard to get right.  To write a compiler which could 
deduce everything that a programmer might deduce is a Herculean task.

So I ask anybody: do you think you could write a compiler which deduced 
everything that DMD deduced?  Even if you could, how would you verify 
that you did?  If *anyone* tried to write a compiler of their own, the 
inevitable consequence would be to fork the language into templates 
which DMD can deduce, and templates which the other compiler can deduce.

Then consider CTFE (compile time function evaluation).  This seems 
better specified...but would you say it would be "easy" to write a 
compiler that included this feature?

Because of this, we have, effectively, only a single compiler in the 
ecosystem.  This is how it stays until we change something.

The key insight I had was that the vast majority of the features that 
make a D compiler complex (like IFTI, CTFE, mixins, etc.) are not things 
which *structurally* change the program; they are deductions and 
calculations carried out by the front-end.

So, my proposal is that we start an open-source project which is a "D 
simplifier."  This tool takes really complex D code, and performs 
transformations on it...but only performs transformations that are 
clear, always-valid, semantics-retaining transformations.  That is, it 
performs only the sorts of transformations that *any* D compiler will 
need to make.  The output is vastly simplified D code.

Then, once it is working, we hound Walter to make future language 
changes (whenever possible) in this "dsimpl" tool, rather than making 
private changes in DMD.

Of course, DMD (and all other compilers) will need to add new features 
over time.  Some features (like const) really need to be known by the 
back-end D compiler.  But most D features can be rewritten in a simpler 
form, and dsimpl is the standard, shared tool which should accomplish 
this rewriting.

My argument is that it would be easy - or at least much easier - to 
write a D compiler which compiles that restricted subset.  This would 
make it practical to have an ecosystem with many different D compilers, 
all of which reflect the same D standard.

This tool also makes it practical for people to contribute to language 
development (when IFTI doesn't deduce your template, you can write the 
code to make it work, and contribute it to the project) or to experiment 
with new features.

THE COUNTER-ARGUMENT, AND WHY IT REALLY IS THE PRO-ARGUMENT

The obvious counter-argument to this is that any "dsimpl" tool, as I 
envision it, would be very complex.  But that is, of course, exactly the 
reason why we *MUST* have this tool.  If the translation tool is 
complex, then building a compiler that does the translation *and* then 
compiles the result is *VERY* complex.

IMHO, the only reason to *not* have a common, open-source, standard 
translation tool is if translation was easy!

SPECIFIC DESIGN

IMHO, if you are going to build an open-source parser for the D language 
(which, of course, is the first step in writing a compiler/simplifier), 
you should package the parser as a library.  So I would split the 
project into three parts:

* A parser library.  The objects include toString() functions to produce
   the D code that they represent.  But the library could also be used
   by other tools.
* A transformation program, which transformed the parsed code into a
   new form.  3rd party compilers might pick up the code at this point
   and compile it immediately.
* Code which calls toString() on the transformed code to output the
   simplified code as D

THINGS I THINK A SIMPLIFIER SHOULD DO

My operating theory is that the simplifier should automatically do 
things which reduce the complexity of the language.  But it should limit 
itself to things the implementation of which is clear and likely to be 
acceptable to all compiler writers.  If we do a questionable 
transformation, and as a consequence complier X says "I can't use the 
simplifier, it does the wrong thing," then we haven't really 
accomplished what we need to.  Here's my proposed (and incomplete) list 
of things it could do:

* IFTI (make all templates explicit)
* CTFE (replace all compile time function calls with literals)
* Handle mixins
* Replace all "auto" variables (type deduction) with explicit types
* Make all calls to "typesafe variadic functions" (functions taking
   variadic arrays) explicit - that is, manually build the array
   argument which you will pass
* Resolve version() and debug() blocks
* Expand all aliases to the underlying types
* Convert foreach() statements over arrays into for() statements
* Convert foreach() statements over other types into delegate literals
   and call them in the correct way
* Convert all operator overloading into member function calls
* Rewrite all integer & string literals in a standard form, so the
   backend parser only has to parse one format for each type.
* ...many more things...

In addition, I think we could add a set of optional transformations, to 
be used only by people who wanted them.  Such as:

* Collapse all code (except library code) into a single source file
   (necessary for some of the optimizations below).  This would involve
   name mangling of the symbols.
* Expand all templates into non-template entities (again, requiring
   name mangling) and then remove the templates
* Declare structs for all delegate literals, with the literal converted
   to a member function.  Use the struct explicitly.  (This removes any
   need to support delegate literals in the simplified language.)
* When closures are required, allocate the struct on the heap
   instead of on the stack.
* Inlining
* Dead code elimination
* Expression simplification & optimization
* Advanced whole-program analysis & optimization

WHO STARTS THIS?

Unfortunately, I need approval from my management chain before I start 
any open source project, and it's very hard to get.  So I can't start 
this project.  But, it's much easier for me to get approval to join a 
project that is already ongoing.  So this email is my proposal; if other 
people think that this is a good idea, they'll have to get it well 
started.  But then I will join later. :)

Thoughts?
   Russ