How can you read and understand the source of *naryFun in functional.d?

Sat Jan 29 13:14:56 PST 2011

I hope this gets pasted right, otherwise I'll have to open a freakin' blog. :)

Up until a few months ago, I had no experience with templates whatsoever.

At first, I started using function templates, because they were the
easiest to grasp. Or so I thought. Basically I just used function
templates as if they are a special kind of function that takes any
types of arguments. Sort of like having a regular Python function in
your D code. I didn't quite grasp the point of templates back then.

What really opened my eyes was the realization that in D there's a
distinct notion between manipulating types at compile-time and
manipulating types at run-time.

Once I've realized that, I knew why template arguments have to be
known at compile time, because templates are a compile-time feature.
And templates aren't really a "type" at runtime, they're only visible
at compile-time. They're just "placeholders" with their own scope, and
they hold declarations in that scope. And to access those declarations
(or fields would be a proper word), you generally use the dot
operator. There's also the eponymous trick where you can use the
template instantiation itself to refer to one of its fields (this is
useful when you're only interesting in getting back one result, as it
saves you the burden of having to explicitly specify that field when
instantiating the template).

Then, I've realized that templates can be used almost anywhere. A
template can have fields which are templated structs and classes,
which in turn can have templated functions, which in turn have
templated declarations in them, and the functions can have templated
return values.

So, let's talk about unaryFun now. unaryFun is a template which has
the ability to construct a function, by simply passing a string as an
argument. It's named *unary* because it constructs a function that
takes only a single argument. You pass the string in the syntax form
where it's as if you're only writing the return expression. unaryFun
has the ability to convert some special predefined characters to
parameters of a function. For example, it recognizes that 'a' should
be replaced with the argument to the newly constructed function.

Here's a simple instantiation of unaryFun:
unaryFun!("(a & 1) == 0")

unaryFun takes the string argument and might create this function:

Let me walk you through unaryFun line by line (I'm using DMD 2.051).
First, our example code:

alias unaryFun!("(a & 1) == 0") isEven;
assert(isEven(2) && !isEven(1));

Now open std.functional.

Line 40: unaryFun instantiates unaryFunImpl. You'll notice it passes
the same arguments to unaryFunImpl as the ones it got from the calling
code. We passed a string, and the two other arguments are set to
'false' and 'a' by default, since we didn't specify them. Once the
unaryFunImpl template is instantiated, we can select any of its fields
with the dot operator. In this case, we're interested in the result
field.
So.. what does the alias do? It's a nice thing called the eponymous
trick, where you can use the name of the template itself as the result
of the instantiation. In other words, if you alias the name of a
template to some declaration, when you instantiate that template you
won't have to use the dot prefix to get a single result.
A quick example without the eponymous trick:
template foo(int value)
{
    int result = value;
}
void main()
{
    writeln( foo!(4).result );
}

And here's one with the eponymous trick:
template foo(int value)
{
    int result = value;
    alias result foo;  // we've aliased the foo template name to the
result field
}
void main()
{
    writeln( foo!(4) );  // no need to explicitly use .result here
}

This is very handy when you're only interested in getting one field of
a template.
Now let's see how unaryFunImpl works.

Line 45: This is a *compile-time* check. It tries to determine if the
fun argument is a string.
There's an else clause way down on line 98, which, in case the fun
argument isn't a string, simply aliases the fun argument to the name
'result'.
So, why do we need to check if the 'fun' argument is a string? Well,
because 'fun' in the template
signature of unaryFunImpl is defined as an 'alias' type. An alias is a
parameter type that can be a string, a value, a type, a function, or
any other valid D symbol name. The reason alias is used in
unaryFunImpl is because we can call unaryFunImpl in different ways. In
our own code we've used:

void main()
{
    alias unaryFun!("(a & 1) == 0") isEven;
}

In this case the 'static if' check on line 45 will pass since the
argument is a string. But we could have called unaryFun like so:

void foo(int x)
{
}
void main()
{
    alias unaryFun!(foo) isEven;
}

In this case we're passing a function parameter at compile time. The
'static if' check will evaluate to false since the fun parameter is an
actual function, not a string. This brings us to the else clause on
line 98. Since the whole purpose of unaryFun is to construct a
function out of a string, it doesn't make sense to do any work if
'fun' isn't a string. And if 'fun' isn't a string, it's likely a
function, so we alias it to 'result' and we're done. 'fun' could also
be a delegate (a function pointer), or a struct with a defined opCall
method (these kinds of structs can be called as if they were
functions).

Ok let's continue. We've passed a string ("(a & 1) == 0"), so our
static if check on line 45 evaluates to true. Now comes the fun part,
a template within a template!

Line 47: 'Body' is a template that is only used within unaryFunImpl,
which explains why it's nested inside it. It takes one compile-time
argument, in this case that argument will be a 'type' argument (such
as int/double/etc).
Let's just assume for a second that we've instantiated the Body
template with an int argument ( Body!(int) ).
On line 51 and 52 there are two enum declarations:
    enum testAsExpression = "{ ElementType "~parmName ~"; return ("~fun~");}()";
    enum testAsStmts = "{"~ElementType.stringof ~" "~parmName~"; "~fun~"}()";
Note how we're able to use the arguments from the enclosing template.
I'm refering to the use of 'parmName' and 'fun', which are arguments
belonging to unaryFunImpl.
Recall that our initial example code is this:
    unaryFun!("(a & 1) == 0")

We can now easily expand these two enums to something more meaningful:
    enum testAsExpression = "{ ElementType a; return ((a & 1) == 0);}()";
    enum testAsStmts = "{ int a; (a & 1) == 0; }()";

It might be a bit hard to read. These two enums are two versions of a
function (in string form) that will be tested to see if they're valid
and compilable as actual functions.
Notice that in the string we don't specify the function name, or the
function arguments for that matter. This is called a 'lambda', an
anonymous function which doesn't have a name.
Also notice that at the end of the string there's an open and closed
paranthesis '()'. Keep that in mind for now.

Moving to line 57:
Another static if check. __traits is a compiler feature, similar to a
pragma, where you can do some compiler intrinsics and get information
from the actual compiler itself. __traits takes two arguments, the
first can be one of the predefined keywords that defines what you're
asking the compiler to check for (e.g. "give me all members of some
struct", or "hey compiler, does the following code compile?", see
http://www.digitalmars.com/d/2.0/traits.html for more). The second
argument is what the compiler will do intrinsics on. Here, we're using
the 'compiles' keyword, which asks the compiler if it can succesfully
compile an expression. That expression can be anything that's a valid
expression.

This is where our enums come to play. What we want to do, is convert
the enums from their string representation into actual D code that is
compiled in. This is where 'mixin' comes into play. mixin takes any
string, and compiles it in as if it were any regular D code.
For example:
void main()
{
    mixin("int x;");
    writeln( x );  // x exists now

    mixin(" struct Foo { } ");
    Foo foo;  // foo is a Foo type
}

So with the power of __traits(compiles..) and mixin('string'), we're
able to compile strings as if they were D code, and then check if that
code is actually valid. Here's some examples of the two in work:

    static if (__traits(compiles, mixin("int x;")))
    {
    }

    static if (__traits(compiles, mixin("class Foo { void bar() { } }")))
    {
    }

    static if (__traits(compiles, mixin(" { int a; a++; return a; } ")))
    {
    }

Notice the last example was an anonymous function (a lambda).
Ok, we were on line 56, the static if is checking if the enum
'testAsExpression' is compilable as D code. If it's not compilable (if
we passed an invalid string representation of a function in our 'fun'
argument), then we reach the else clause on line 67. We'll hit a
static assert here, which means all compilation stops and you'll get
an error. Here's an example that does just that:

import std.functional;
void main()
{
    string brokenFunctionRepresentation = "(a & 1) == b";
    alias unaryFun!(brokenFunctionRepresentation) isEven;
    isEven(4);
}

> D:\DMD\dmd2\windows\bin\..\..\src\phobos\std\functional.d(74): Error: static assert  "Bad unary function: " ~ brokenFunctionRepresentation ~ " for type " ~ "int"c
> D:\DMD\dmd2\windows\bin\..\..\src\phobos\std\functional.d(87):        instantiated from here: Body!(int)
> testing45.d(10):        instantiated from here: result!(int)

But, we *did* pass a valid argument, so the static if check will
evaluate to true.
Line 59: This enum simply expands to:
    enum string code = "return ( (a & 1) == b );";

The next line can be difficult to grasp, so let me walk you through it.
Here again we're using mixin to compile code that's in string form.
Remember the value of 'testAsExpression':
    enum testAsExpression = "{ ElementType a; return ((a & 1) == 0);}()";

Ok, so how do we compile this in? Well, remember that ElementType is a
type argument passed to the Body template (the one we're in now). And
remember that I've said that for the moment we're pretending that that
argument is an int. This means when mixed in the expression becomes:

{ int a; return ((a & 1) == 0);}();

Remember when I discussed the two enums and how they have the open and
closed paranthesis at the end? These are used to call the function
that was just defined. It's a lambda function, so we have no other
immediate way of calling it, since a lamda doesn't have a name.

Here's some lambda examples that are immediately called:
    import std.functional;
    import std.stdio;
    void main()
    {
        enum lamdaWithCall = " { return 10 * 20; }()";
        writeln( mixin(lamdaWithCall) );  // writes 200

        enum lambdaWithoutCall = " { return 10 * 20; }";
        writeln( mixin(lambdaWithoutCall)() );  // notice paranthesis
outside the mixin
    }

Back to our code at line 60. We're mixing in the enum that has a
lambda and a call. This will be compiled, which means the function
will be called and it will return a value. 'typeof' returns the type
of its argument. Since our function will most likely return an int,
typeof will simply return 'int' (In other words, if the function
returned '1', it's no longer 1, it's the type of 1 which is an int).
Now on the left you see we're using an alias. We're aliasing the int
type to the 'ReturnType' symbol name.

That's it for the Body template. It's function is to parse a string,
try to construct and evaluate a function from the string argument
UnaryFunImpl passes in, and in return it defines two important
declarations that our unaryFunImpl template will need:
1. The actual code that will be used as the function definition ('enum
string code').
2. The return type of this function ('ReturnType').

Back to our UnaryFunImpl template. We've just pretended that Body was
called, but now UnaryFunImpl will call it for the first time.

Line 78: This static if checks if our function takes a reference as an
argument. We have to manually let UnaryFunImpl know about this
(actually we have to let UnaryFun know about it, it passed that
argument down to UnaryFunImpl), which is the purpose of the 'byRef'
argument. Here's an example demonstrating the differences:

    import std.functional;
    import std.stdio;
    void main()
    {
        alias unaryFun!("a = 2") byValue;
        alias unaryFun!("a = 2", true) byReference;

        int x = 5;

        byValue(x);
        assert(x == 5);

        byReference(x);
        assert(x == 2);
    }

After the static if check on line 78 comes the complicated function
definition. The full definition is:
Body!(ElementType).ReturnType result(ElementType)(ref ElementType a)

Let's chop this down into pieces:
'result' is a templated function. It takes a compile-time argument ElementType,
and a runtime argument 'a' which is of type reference of ElementType.
The return type is specified as:
    Body!(ElementType).ReturnType

This is one of the two fields that the Body template defines for us
after we instantiate it. Remember at the very beginning I've noted
said you can use templates on return values, well this is one such use
case. So, the compiler instantiates the Body template with the
ElementType argument, and then looks at the field 'ReturnType'.
Remember, 'Body.ReturnType' is the actual type of the return value of
that function string that was mixed in.

But, where is ElementType defined? We have to go back and look at the
result function definition again:
Body!(ElementType).ReturnType result(ElementType)(ref ElementType a)

Okay, ElementType is figured out from the compile-time argument to
result. So now you're probably wondering "wait a minute. We never
passed any argument to result!". That's right. We're not calling
result yet. We're just *defining a templated function*. Remember that
a template is nothing more then a set of declarations. The 'result'
templated function is just a declaration within UnaryFunImpl.

Now, onto line 82:
Here we instantiate Body with the ElementType argument, we take it's
'code' field (remember Body is used to make two fields, 'code' and
'ReturnType'), and we mix in the code.

This was the case if our function took a ref parameter. Now, let's
jump back to line 40, to unaryFun itself:
    alias unaryFunImpl!(funbody, byRef, parmName).result unaryFun;

Here, unaryFun instantiates unaryFunImpl with a string ('funbody'
here, although unaryFunImpl uses 'fun' as the name) that is going to
be constructed as a lambda and then checked for validity using the
Body template. We also pass two other arguments which are usually set
to default. After the template is instantiated, we want its 'result'
field. 'result' is a full-fledged templated function which we have
just been talking about. We alias this new function to 'unaryFun', and
this is our eponymous template trick where we alias some symbol to the
template name.

Now, up the stack, we've made a complete trip so let's see our
original code again:

alias unaryFun!("(a & 1) == 0") isEven;
assert(isEven(2) && !isEven(1));

We instantiated unaryFun with a string, it calls unaryFunImpl and
passes it the string and some default arguments, unaryFunImpl in turn
calls Body to parse the string and construct the code and return type.
unaryFunImpl then proceeds to construct the 'result' templated
function. unaryFun in turn takes this function and aliases it to
itself. And now we take the template instantiation (which is aliased
to this new function), and alias *that* to isEven.

isEven is now equal to the 'result' templated function. You can prove
it, too! Here's another instantiation of unaryFun:

import std.functional;
import std.stdio;
void main()
{
    alias unaryFun!(" 2 / a") twoDividedByArg;

    assert(twoDividedByArg!(int)(1) == 2);
    assert(twoDividedByArg!(double)(0.5) == 4);

    assert(twoDividedByArg(1) == 2);
    assert(twoDividedByArg(0.5) == 4);
}

The reason the last two calls work is because D has a shorthand for
calling templated functions. If the runtime argument can be determined
at compile-time (in this case these are integral literals, so the
answer is yes), then you don't have to use the bang-parenthesis !()
syntax to call a templated function.

I still haven't explained two things. One is how 'result' is
constructed when called with a function that doesn't take ref
parameters. Let's go to line 87 in functional.d.
This is almost the same definition of 'result' as the one in the
static if body. Except this time it doesn't take a reference a
argument.
On the next line there's some weird mixin:
    mixin("alias __a "~parmName~";");

The reason it's weird is that I haven't explained what the *third*
argument to unaryFun is. The "string parmName" parameter. This one is
used in case you want to use a different character to designate is as
the parameter in your function literal string. Here's how this works:

    import std.functional;
    import std.stdio;
    void main()
    {
        alias unaryFun!(" 2 / a", false) twoDividedByArg;
        alias unaryFun!(" 2 / bcd", false, "bcd") twoDividedByArgB;

        assert(twoDividedByArg!(int)(1) == 2);
        assert(twoDividedByArgB!(double)(0.5) == 4);
    }

So essentially it allows us to use any string of characters instead of
"b". Now these two mixins will make a lot more sense:
    mixin("alias __a "~parmName~";");
    mixin(Body!(ElementType).code);

Remember, 'Body.code' is the code of our function literal. If our
function uses 'bcd' instead of the name that was explicitly defined in
'result', then compilation will fail. See the 'result' definition on
line 87:
    Body!(ElementType).ReturnType result(ElementType)(ElementType __a)

So how do we make our code use any character of strings as the
argument and yet it has to refer to __a? By using an alias, of course!
    mixin("alias __a "~parmName~";"); expands to:
    mixin("alias __a bcd;");

    And mixin(Body!(ElementType).code); expands to:
    return (2 / bcd);

The whole 'result' function then reads:
Body!(ElementType).ReturnType result(ElementType)(ElementType __a)
{
    alias __a bcd;
    return (2 / bcd);
}

And that's the whole story!

I hope this wasn't too difficult to follow. unaryFun might actually be
a rather complicated example to explain. But the good news is, once
you understand one template like unaryFun, you'll start grasping the
rest in no-time. The reason why Phobos has such difficult to read
templates is because writing templates in D is very easy. Before you
know it, your templates end up being hundreds of lines long.

But the benefit over C++ is that you can *read* the template syntax
with little to no missunderstanding. The complications arise when the
templates have a lot of nested templates, and when they make calls to
a lot of other templates. But template calls are easy to trace.
Dissasembling C++ syntax however is not. Well, unless your name is
Andrei Alexandrescu, that is. :)