Things that make writing a clean binding system more difficult

Ethan Watson via Digitalmars-d digitalmars-d at puremagic.com
Thu Jul 28 01:33:22 PDT 2016


As mentioned in the D blog the other day, the binding system as 
used by Remedy will both be open sourced and effectively 
completely rewritten from when we shipped Quantum Break. As I'm 
still deep within that rewrite, a bunch of things are still fresh 
in my mind that aren't that great when it comes to D and doing 
such a system.

These are things I also expect other programmers to come across 
in one way or another, being that they seem like a simple way to 
do things but getting them to behave require non-trivial 
workarounds.

I also assume "lodge a bug" will be the response to these. But 
there are some cases where I think documentation or 
easily-googleable articles will be required instead/as well. And 
in the case of one of these things, it's liable to start a long 
circular conversation chain.

====================
1) Declaring a function pointer with a ref return value can't be 
done without workarounds.

Try compiling this:

ref int function( int, int ) functionPointer;

It won't let you, because only parameters and for loop symbols 
can be ref types. Despite the fact that I intend the function 
pointer to be of a kind that returns a ref int, I can't declare 
that easily. Easy, declare an alias, right?

alias RefFunctionPointer = ref int function( int, int );

Alright, cool, that works. But thanks to the binding system 
making heavy use of function pointers via code-time generated 
code, that means we then have to come up with a unique name for 
every function pointer symbol we'll need. Eep.

Rather, I have to do something like this:

template RefFunctionPointer( Params... ) if( Params.length > 1 )
{
   ref Params[ 0 ] dodgyFunction( Params[ 1 .. $ ] );
   alias RefFunctionPointer = typeof( &dodgyFunction );
}
RefFunctionPointer!( int, int, int ) functionPointer;

This can also alternately be done by generating a mixin string 
for the alias inside of the template and not requiring a dummy 
function to get the type from. Either way, it gets rid of the 
unique name requirement but now we have template expansion in the 
mix. Which is something I'll get to in a second...

Needless to say, this is something I wasted a lot of time on 
three years ago when I was getting the bindings up to speed 
originally. Turns out it's not any better in DMD 2.071.

====================
2) Expansion of code (static foreach, templates) is slow to the 
point where string mixins are a legitimate compile-time 
optimisation

Take an example of whittling down a tuple/variable argument list. 
Doing it recursively would look something like this:

template SomeEliminator( Symbols... )
{
   static if( Symbols.length >= 1 )
   {
     static if( SomeCondition!( Symbol[ 0 ] ) )
     {
       alias SomeEliminator = TypeTuple!( Symbol[ 0 ], Symbols[ 1 
.. $ ] );
     }
     else
     {
       alias SomeEliminator = TypeTuple!( Symbols[ 1 .. $ ] );
     }
   }
   else
   {
     alias SomeEliminator = TypeTuple!( );
   }
}

Okay, that works, but the template expansion is a killer on 
compile-time performance. It's legitimately far quicker on the 
compiler to do this:

template SomeEliminator( Symbols... )
{
   string SymbolSelector()
   {
     string[] strOutputs;
     foreach( iIndex, Symbol; Symbols )
     {
       static if( SomeCondition!( Symbol ) )
       {
         strOutputs ~= "Symbols[ " ~ iIndex.stringof ~ " ]";
       }
     }
     return strOutputs.joinWith( ", " );
   }
   mixin( "alias SomeEliminator = TypeTuple!( " ~ SymbolSelector() 
~ " );" );
}

With just a small codebase that I'm working on here, it chops 
seconds off the compile time. Of course, maybe there's something 
I'm missing here about variable parameter parsing and doing it 
without a mixin is quite possible and just as quick as the mixin, 
but that would make it the third method I know of to achieve the 
same effect. The idiomatic way of doing this without mixins 
should at least be defined, and optimised at the compiler level 
so that people don't get punished for writing natural D code.

Then there was this one that I came across:

outofswitch: switch( symbolName )
{
   foreach( Variable; VariablesOf!( SearchType ) )
   {
     case Variable.Name:
       doSomething!( Variable.Type )();
       break outofswitch;
   }
   default:
     writeln( symbolName, " was not found!" );
     break;
}

This caused compile time to blow way out. How far out? By 
rewriting it like this, I cut compile times in half (at that 
point, from 10 seconds to 5):

switch( symbolName )
{
   mixin( generateSwitchFor!( SearchType )() );
   default:
     writeln( symbolName, " was not found!" );
     break;
}

Now, I love mixins, both template form and string form. The 
binding system uses them extensively. But mixins like this are 
effectively a hack. Anytime I have to break out a mixin because 
my compile time doubled from a seemingly simple piece of code is 
not good.

====================
3) __ctfe is not a CTFE symbol.

This one bit me when I was trying to be efficient for runtime 
usage while allowing a function to also be usable at compile time.

int[] someArray;
static if( !__ctfe )
{
   someArray.reserve( someAmount );
}

Reserve not working in compile time? Eh, I can live with that. 
__ctfe not being a symbol I can static if with? Well, sure, I 
suppose that would work if the compiler wouldn't even try 
compiling the code inside the __ctfe block. But it doesn't do 
that. It does symbol resolution, and then your code doesn't run 
at compile time. It's at that point where I ask why have the 
__ctfe symbol if you can only use it effectively at runtime? 
Doesn't that only make it half useful?

I understand this is a longstanding complaint too, so this serves 
as a reminder.

====================
4) Forward declaring a function prototype means I can never 
declare that function elsewhere (say, for example, with a mixin)

The binding system works something like this:

* Declare a function, mark it with a @BindImport UDA.
* Compile time code scans over objects and symbols looking for 
functions tagged @BindImport.
* Generate __gshared function pointers that match the signature 
(and rewrite parameters to pass this in where applicable).
* Generate function definitions that call the function pointers 
(with this if it's a method), allowing a programmer to just call 
the function declaration like it was any old ordinary piece of D 
code.

That fourth part is where it all falls over.

We shipped Quantum Break by defining your imports with a 
preceding underscore (ie @BindImport int _doSomeAction();) and 
generated function definitions with the exact same signature 
minus the underscore. The new way I'm doing it is to define all 
these functions in a sub-struct so that all I need to rewrite is 
the parameters.

All this because I cannot later define a forward-declared 
function.

.di files are both not a language feature (documentation notes it 
is explicitly a compiler feature), and don't even match what I 
need here as they're generated from complete code with no 
possibility of using them in more of a .cpp/.h paradigm. So 
they're out of the question. *Unless* they're upgraded to a 
language feature and allow me to define full class declarations 
with later implementations of some/all functions.

This also isn't the only use case I have. I'm a game engine 
programmer. We write a lot of abstracted interfaces with platform 
specific implementations. I know, I know, version(X){} your code, 
right? But that's not how everyone works. Some implementations 
really do require their own file for maintenance and legal 
purposes. But for an example outside of gaming? Take a look at 
core.atomic. Two implementations in the same file *AND* a 
separate one for documentation purposes. LDC's core.atomic also 
has an LLVM definition in there. And if someone writes native ARM 
support for DMD, that'll be even more implementations in the same 
file. Take note of the duplicated enums and deprecations between 
definitions, the alternative to which is to put a version block 
inside every function that requires special behaviour. Either way 
is not clean, I tells ya.

I'm sure there's many cases where declaration and later 
definition is also a perfectly valid programming pattern, and I 
don't see at all how these use cases can conflict with D's 
forward referencing since it doesn't change referencing rules at 
all, it only changes the definition rules.


More information about the Digitalmars-d mailing list