xdc: A hypothetical D cross-compiler and AST manipulation tool.

Chad Joan chadjoan at gmail.com
Sun Jul 21 23:10:58 PDT 2013


On Saturday, 20 July 2013 at 06:28:03 UTC, deadalnix wrote:
> On Saturday, 20 July 2013 at 04:44:38 UTC, Chad Joan wrote:
>> On Saturday, 20 July 2013 at 04:38:20 UTC, cal wrote:
>>> On Thursday, 18 July 2013 at 03:26:10 UTC, Chad Joan wrote:
>>> [...]
>>>
>>> Is the input to xdc a semantically-analyzed D AST, or does 
>>> semantic analysis occur during pattern-matching/lowering?
>>
>> The latter.
>>
>> xdc would accept D code as text input (.d files) and parse it 
>> to produce its own AST.  Semantic analysis is then done by 
>> matching patterns in the AST and doing substitutions until all 
>> that's left are the AST nodes the backend wants.  The backend 
>> then matches patterns and emits the desired output (instead of 
>> substituting AST nodes).
>
> I'm not sure how you'll handle all compile time features.

To be honest, I hadn't yet written down what these would look 
like.  However, I did have an idea of what I wanted them to look 
like.

So, I will try to write down my thoughts on the subject.  Here, 
have a wall of text ;)

For Templates:

Note that the machine running the pattern-match-and-replace is 
really only constrained to a DFA/Packrat when it is recognizing.  
Once a pattern is recognized, D code may be invoked to do a 
(hopefully minimal) amount of computation.  Also, when a pattern 
is substituted, then it may return to the beginning of the 
substitution.

The set-the-cursor-at-begging-of-substitution thing is something 
I'm not entirely sure of yet, but it seems like a good way to 
avoid an explosion in the number of passes:
<cursor>auto foo = Templ!T();
<cursor>Templ!T foo = Templ!T();
<cursor>_D5TemplMangleMangle foo = Templ!T();
_D5TemplMangleMangle foo = <cursor>Templ!T();
_D5TemplMangleMangle foo = <cursor>Templ!T.__ctor();
_D5TemplMangleMangle foo = 
<cursor>_D5TemplMangleMangle6__ctorMangle();
...
The real deal wouldn't omit so many steps, but hopefully this 
conveys the usefulness.
Now, that /may/ be useful in template instantiation.  It is 
mostly for convenience though.  Still, it is noteworthy in that 
it is not in the conventional realm of 
DFAs/Packrats/Formal-Language-Theory: those things usually do not 
talk about what happens when the input is modified.  In other 
words, the use of DFAs/packrats in semantic analysis does not 
limit its computational power.

The more important thing for templates though is this: that part 
where the D code may be invoked after pattern recognition.  And 
it's important because it allows you to do more 
recognize-and-substitute without losing your place.  It allows a 
kind of recursion.  See the insides of the example 
"lowerValueTemplateInstantiation" handler I wrote in this post 
later on.

To be more concrete, I present a step-by-step followed by some 
examples of what I think the code might look like.

=================================================================

So imagine you have a template to be instantiated:
template Fib(uint i)
{
     static if ( i <= 1 )
         const Fib = i;
     else
         const Fib = Fib!(i-1) + Fib!(i-2);
}

Somewhere else, this appears:
     writefln("Fib == %s", Fib!4);

Suppose the parameter "i" is 4.  Then we really want to end up 
substituting this template with the following line:
   const _D4main11__T3FibVi4Z3Fibxk = 3;
and it should also emit these, as a side effect:
   const _D4main11__T3FibVi0Z3Fibxk = 0;
   const _D4main11__T3FibVi1Z3Fibxk = 1;
   const _D4main11__T3FibVi2Z3Fibxk = 1;
   const _D4main11__T3FibVi3Z3Fibxk = 2;

To save time and not bore you, I will cheat and set the cursor
at the template instantiation and skip all the writefln stuff.
Our lowering proceeds like so:

// We begin!

writefln("Fib == %s", <cursor>Fib!4 );

// "lowerValueTemplateInstantiation" catches the Fib!4.
// Make new xdc context; jump to the template declaration.
// Initial state:

template Fib(uint i)
{
     static if ( i <= 1 )
         const Fib = i;
     else
         const Fib = Fib!(i-1) + Fib!(i-2);
}

// Substitute i using "substituteTemplateParams"

template Fib(uint i)
{
     static if ( 4 <= 1 )
         const Fib = 4;
     else
         const Fib = Fib!(4-1) + Fib!(4-2);
}

// Invoke "lowerTemplateDecl"

<cursor>template Fib(uint i)
{
     static if ( 4 <= 1 )
         const Fib = 4;
     else
         const Fib = Fib!(4-1) + Fib!(4-2);
}

/****************
But wait, "lowerTemplateDecl" doesn't meet it's constraint yet. 
consumes = "!static_if, !static_foreach" is not satisfied.
****************/
// Invoke "lowerStaticIf" (sorry, not written yet)

template Fib(uint i)
{
     <cursor>static if ( 4 <= 1 )
         const Fib = 4;
     else
         const Fib = Fib!(4-1) + Fib!(4-2);
}

...

template Fib(uint i)
{
     static if ( <cursor>4 <= 1 )
         const Fib = 4;
     else
         const Fib = Fib!(4-1) + Fib!(4-2);
}

// "lowerStaticIf" needs a literal here, not an expression.
// Invoke "constantFold" (sorry, not written yet. Uses CTFE.)

template Fib(uint i)
{
     <cursor>static if ( false )
         const Fib = 4;
     else
         const Fib = Fib!(4-1) + Fib!(4-2);
}

// "lowerStaticIf" may now proceed and finish.

template Fib(uint i)
{
     const Fib = Fib!(4-1) + Fib!(4-2);
}

// "lowerTemplateDecl"'s !static_if constraint is now satisfied
// It mangles the constant identifier and moves it to the root.

const _D4main11__T3FibVi4Z3Fibxk = Fib!(4-1) + Fib!(4-2);

/****************
The newly substituted declaration gets subjected to further 
reductions, as that's what happens after a substitution.  Another 
pattern, let's call it "lowerConstDecl", notices the constant 
declaration sitting there with an /expression/ (oh dear) instead 
of a literal.  It invokes "constantFold".
****************/
// This starts whole process over again.  Repeatedly.

const _D4main11__T3FibVi0Z3Fibxk = 0;
const _D4main11__T3FibVi1Z3Fibxk = 1;

const _D4main11__T3FibVi2Z3Fibxk =
     _D4main11__T3FibVi1Z3Fibxk + _D4main11__T3FibVi0Z3Fibxk;

const _D4main11__T3FibVi3Z3Fibxk =
     _D4main11__T3FibVi2Z3Fibxk + _D4main11__T3FibVi1Z3Fibxk;

const _D4main11__T3FibVi4Z3Fibxk =
     _D4main11__T3FibVi3Z3Fibxk + _D4main11__T3FibVi2Z3Fibxk;

// Constant folding continues.

const _D4main11__T3FibVi0Z3Fibxk = 0;
const _D4main11__T3FibVi1Z3Fibxk = 1;
const _D4main11__T3FibVi2Z3Fibxk = 1;
const _D4main11__T3FibVi3Z3Fibxk = 2;
const _D4main11__T3FibVi4Z3Fibxk = 3;

/****************
There is nothing left to do here, so we return to the previous 
context.
****************/

writefln("Fib == %s", <cursor>Fib!4 );

// The instantiation figures out the mangling.

writefln("Fib == %s", <cursor>_D4main11__T3FibVi4Z3Fibxk );

// Done.  (for now)

writefln("Fib == %s", _D4main11__T3FibVi4Z3Fibxk );

=================================================================
== The more central code in all of this might look like so:

const lowerValueTemplateInstantiation =
{
     auto consumes = "value_template_instantiation";
     auto produces = "";

     auto recognizer = Pattern!
         // Ex: main.Fib!(4)
         "ValueTemplateInstantiation $valueTemplateInstatiation has
         {
             // Ex: main.Fib
             IdentifierPath $path;

             // Ex: (4)
             // We ask for LiteralExpr here to coerce the engine
             //   into doing constant folding on whatever
             //   expressions where in the argument tuple.
             ArgsTuple has
                 any_amount_of LiteralExpr $args;
         }";

     auto action = (PatternMatch!(recognizer) m)
     {
         auto templateDeclNode =
             m.xdcContext.symbolLookup(m.getCapture!"path");

         // This is where the recursion happens:

         // First, we create a context that we can scope the
         //   template parameters into.
         auto context = m.xdcContext.push();
         scope(exit) m.xdcContext.pop();

         // Last, we use the new context to jump to another
         //   location in the AST and tell it to conquer some
         //   template declarations for us.
         context.declare("args", m.args);
         auto tmpNode =
             context.invoke!substituteTemplateParams(
                 templateDeclNode);
         context.invoke!lowerTemplateDecl(tmpNode);

         // Mangling will require its own recursive joy ride.
         // It will probably be simpler though, because it doesn't
         //   require any substitutions.
         // It helps that we've already ensured that all of the
         //   arguments to the template instantiation have been
         //   reduced to literals by this point, which will be
         //   possible to mangle.  (Attempting to mangle an
         //   arbitrary expression would probably be a throwable
         //   offense, or better yet, forces xdc to not compile.)
         VarExpr e = new VarExpr(
             m.getCapture!"valueTemplateInstatiation".mangle );

         m.captures.add("mangledSymbolReference", e);
     };

     auto synthesizer = Pattern!"$mangledSymbolReference";

     return new PatternHandler(
         produces, consumes, recognizer, action, synthesizer);
};

/* This does NOT get registered with the rest of the global
match/replace patterns.  It should only be invoked by template
instantiation handlers, not the xdc engine itself.
In particular, it needs the "args" context to be defined.
*/
const substituteTemplateParams =
{
     // It gets manually invoked, so no need to mention depends.
     auto consumes = "";
     auto produces = "";

     auto recognizer = Pattern!
         // Ex: main.Fib( uint i ) { ... }
         "TemplateDecl $template has
         {
             .parameterList $params;

             any_amount_of
             {
                 any_amount_of .;
                 VarExpr $var;
             }
             any_amount_of .;
         }";

     auto action = (PatternMatch!(recognizer) m)
     {
         // Cop out: This might seem like stuff that would be
         //   suited to the pattern-DSL, but I am not yet sure
         //   that I want to even attempt to teach it how to
         //   generate lookup tables to accomplish this kind of
         //   substitute-from-backreference type of work.
         //   The syntax for that might be nasty anyways.
         //   For now, I'll implement it with this D code.

         AstNode[string] nameLookup;
         auto params = m.getCapture!"params";
         auto args = m.xdcContext.get!"args";

         foreach( i, ref param; params )
         {
             // Populate the lookup table.
             nameLookup[param.identifier] = i;

             // ... aaaand ...
             // Turn this copy of the template into an extremely
             //   specialized one where all of the parameters
             //   already have default values (or types).
             // This will probably be needed for mangling later.
             param.initializer = args[i].deepCopy();
         }

         // Substitute parameter names appearing in the template
         //   with the corresponding literal from the instatiating
         //   code.
         foreach( ref varExpr; m.getCapture!"var" )
         {
             size_t i = nameLookup[varExpr.identifier];
             varExpr.replaceWith( m, args[i].deepCopy() );
         }
     };

     // The necessary substitutions were too complicated for the
     //   pattern language.  Thus, they have already been handled
     //   in the action phase.  We leave the original template
     //   untouched.
     auto synthesizer = Pattern!"$template";

     return new PatternHandler(
         produces, consumes, recognizer, action, synthesizer);
}

/* This does NOT get registered with the rest of the global
match/replace patterns.  It should only be invoked by template
instantiation handlers, not the xdc engine itself.
In particular, it needs the template parameters to have already
been substituted.
*/
const lowerTemplateDecl =
{
     // Rejecting static-if statements and static-foreach
     //   will force the invoking context to lower those into
     //   declarations before proceeding with this match attempt.
     auto consumes = "!static_if, !static_foreach";
     auto produces = "";

     auto recognizer = Pattern!
         "TemplateDecl $template has
         {
             any_amount_of { DeclStatements $decls; };
         }";

     auto action = (PatternMatch!(recognizer) m)
     {
         AstRootNode root = m.xdcContext.getRoot();
         foreach( ref decl; m.getCapture!"decls" )
         {
             decl.identifier = decl.mangle;
             decl.moveTo(root);
         }
     };

     auto synthesizer = Pattern!"";

     return new PatternHandler(
         produces, consumes, recognizer, action, synthesizer);
}

// This pattern handler goes in the global ("all_semantic") set
//   of pattern handlers and will cause all template declarations
//   to disappear once all of the instantiations have been
//   completed.  This is the end of the line for all templates!
const cleanupTemplateDecls =
{
     auto consumes = "template_decl";
     auto produces = "";

     auto recognizer = Pattern!
         "Root $root has
         {
             any_amount_of
             {
                 any_amount_of not TemplateInstatiation;
                 TemplateDecl $templates;
             }
             any_amount_of not TemplateInstatiation;
         }";

     auto action = (PatternMatch!(recognizer) m)
     {
         foreach( ref template; m.getCapture!"templates" )
             template.removeFromTree();
     }

     auto synthesizer = Pattern!"$root";

     return new PatternHandler(
         produces, consumes, recognizer, action, synthesizer);
}


Of course I'm leaving out some things like template parameter 
specialization and template constraints.  I imagine that things 
like this (ex: overloading) will probably require some D code to 
handle.  This will likely be natural, since these kinds of things 
are usually described as a sequence of logical rules or some kind 
of filter.

=================================================================

As for CTFE... I'll have to write that down later.
As it is, it already took me a while to write down all of my 
thoughts on templates.

The original post already dropped a lot of hints though.  It 
pretty much involves lowering the to-be-executed code down to 
something that the interpreter backend can handle, and then 
invoking the interpreter on it.

=================================================================
Other notes and rambling:

I am actually going to go after CTFE very early on in xdc's 
development, specifically because it will be useful for constant 
folding, template instantiation, and (indirectly) maybe even 
strings.  The process might look like this:
- Implement simple builtin types and expressions (char, int,
     float, int*, 3+4, *(foo + 4), etc.).  No arrays, no strings.
- Implement CTFE using a very simple interpreter.
     This gives us an invokable constant folder.
- Implement structs.
- Implement templates.  Templates requiring strings will throw
     exceptions and fail to compile at this point.
     (remember: no strings!)
- Implement operator overloading.
- Implement arrays as a struct-template:
     struct Array(T) { T* ptr; private size_t len; ... }
     This code will only be visible to the compiler, and will be
     used in lowerings whenever needed.  This gets us strings.
- Implement string literals.  (This might get interesting, and
     may even depend on platform, but it should ultimately do
     something similar to calling
     Array!char.__ctor(char *data, size_t len).)
- Templates that use strings will now work.
- Implement reference counting so the whole thing doesn't leak
     memory like a sieve.

It'd look very different from D's actual history.  This is 
because I consider features like templates and CTFE to be very 
"low": we can rewrite a lot of other language features into them.

For situations where things like operator overloading can't 
accomplish what the original builtins could do, then there will 
probably be some necessary compiler magic.  I would apply it as 
conservatively as possible.

This might also get dynamically reconfigured depending on 
platform.  Some platforms might already have builtin string types 
that can be efficiently coerced into behaving like D strings.  In 
those cases you would want to avoid lowering strings into a 
ptr+length struct, and let the backend grab them first.  It may 
even be efficient/helpful to have the interpreter backend behave 
like such a platform and just operate on strings directly without 
first lowering them into a struct.

...

Hope that helps.


More information about the Digitalmars-d mailing list