Things that make writing a clean binding system more difficult
Ethan Watson via Digitalmars-d
digitalmars-d at puremagic.com
Thu Jul 28 01:33:22 PDT 2016
As mentioned in the D blog the other day, the binding system as
used by Remedy will both be open sourced and effectively
completely rewritten from when we shipped Quantum Break. As I'm
still deep within that rewrite, a bunch of things are still fresh
in my mind that aren't that great when it comes to D and doing
such a system.
These are things I also expect other programmers to come across
in one way or another, being that they seem like a simple way to
do things but getting them to behave require non-trivial
workarounds.
I also assume "lodge a bug" will be the response to these. But
there are some cases where I think documentation or
easily-googleable articles will be required instead/as well. And
in the case of one of these things, it's liable to start a long
circular conversation chain.
====================
1) Declaring a function pointer with a ref return value can't be
done without workarounds.
Try compiling this:
ref int function( int, int ) functionPointer;
It won't let you, because only parameters and for loop symbols
can be ref types. Despite the fact that I intend the function
pointer to be of a kind that returns a ref int, I can't declare
that easily. Easy, declare an alias, right?
alias RefFunctionPointer = ref int function( int, int );
Alright, cool, that works. But thanks to the binding system
making heavy use of function pointers via code-time generated
code, that means we then have to come up with a unique name for
every function pointer symbol we'll need. Eep.
Rather, I have to do something like this:
template RefFunctionPointer( Params... ) if( Params.length > 1 )
{
ref Params[ 0 ] dodgyFunction( Params[ 1 .. $ ] );
alias RefFunctionPointer = typeof( &dodgyFunction );
}
RefFunctionPointer!( int, int, int ) functionPointer;
This can also alternately be done by generating a mixin string
for the alias inside of the template and not requiring a dummy
function to get the type from. Either way, it gets rid of the
unique name requirement but now we have template expansion in the
mix. Which is something I'll get to in a second...
Needless to say, this is something I wasted a lot of time on
three years ago when I was getting the bindings up to speed
originally. Turns out it's not any better in DMD 2.071.
====================
2) Expansion of code (static foreach, templates) is slow to the
point where string mixins are a legitimate compile-time
optimisation
Take an example of whittling down a tuple/variable argument list.
Doing it recursively would look something like this:
template SomeEliminator( Symbols... )
{
static if( Symbols.length >= 1 )
{
static if( SomeCondition!( Symbol[ 0 ] ) )
{
alias SomeEliminator = TypeTuple!( Symbol[ 0 ], Symbols[ 1
.. $ ] );
}
else
{
alias SomeEliminator = TypeTuple!( Symbols[ 1 .. $ ] );
}
}
else
{
alias SomeEliminator = TypeTuple!( );
}
}
Okay, that works, but the template expansion is a killer on
compile-time performance. It's legitimately far quicker on the
compiler to do this:
template SomeEliminator( Symbols... )
{
string SymbolSelector()
{
string[] strOutputs;
foreach( iIndex, Symbol; Symbols )
{
static if( SomeCondition!( Symbol ) )
{
strOutputs ~= "Symbols[ " ~ iIndex.stringof ~ " ]";
}
}
return strOutputs.joinWith( ", " );
}
mixin( "alias SomeEliminator = TypeTuple!( " ~ SymbolSelector()
~ " );" );
}
With just a small codebase that I'm working on here, it chops
seconds off the compile time. Of course, maybe there's something
I'm missing here about variable parameter parsing and doing it
without a mixin is quite possible and just as quick as the mixin,
but that would make it the third method I know of to achieve the
same effect. The idiomatic way of doing this without mixins
should at least be defined, and optimised at the compiler level
so that people don't get punished for writing natural D code.
Then there was this one that I came across:
outofswitch: switch( symbolName )
{
foreach( Variable; VariablesOf!( SearchType ) )
{
case Variable.Name:
doSomething!( Variable.Type )();
break outofswitch;
}
default:
writeln( symbolName, " was not found!" );
break;
}
This caused compile time to blow way out. How far out? By
rewriting it like this, I cut compile times in half (at that
point, from 10 seconds to 5):
switch( symbolName )
{
mixin( generateSwitchFor!( SearchType )() );
default:
writeln( symbolName, " was not found!" );
break;
}
Now, I love mixins, both template form and string form. The
binding system uses them extensively. But mixins like this are
effectively a hack. Anytime I have to break out a mixin because
my compile time doubled from a seemingly simple piece of code is
not good.
====================
3) __ctfe is not a CTFE symbol.
This one bit me when I was trying to be efficient for runtime
usage while allowing a function to also be usable at compile time.
int[] someArray;
static if( !__ctfe )
{
someArray.reserve( someAmount );
}
Reserve not working in compile time? Eh, I can live with that.
__ctfe not being a symbol I can static if with? Well, sure, I
suppose that would work if the compiler wouldn't even try
compiling the code inside the __ctfe block. But it doesn't do
that. It does symbol resolution, and then your code doesn't run
at compile time. It's at that point where I ask why have the
__ctfe symbol if you can only use it effectively at runtime?
Doesn't that only make it half useful?
I understand this is a longstanding complaint too, so this serves
as a reminder.
====================
4) Forward declaring a function prototype means I can never
declare that function elsewhere (say, for example, with a mixin)
The binding system works something like this:
* Declare a function, mark it with a @BindImport UDA.
* Compile time code scans over objects and symbols looking for
functions tagged @BindImport.
* Generate __gshared function pointers that match the signature
(and rewrite parameters to pass this in where applicable).
* Generate function definitions that call the function pointers
(with this if it's a method), allowing a programmer to just call
the function declaration like it was any old ordinary piece of D
code.
That fourth part is where it all falls over.
We shipped Quantum Break by defining your imports with a
preceding underscore (ie @BindImport int _doSomeAction();) and
generated function definitions with the exact same signature
minus the underscore. The new way I'm doing it is to define all
these functions in a sub-struct so that all I need to rewrite is
the parameters.
All this because I cannot later define a forward-declared
function.
.di files are both not a language feature (documentation notes it
is explicitly a compiler feature), and don't even match what I
need here as they're generated from complete code with no
possibility of using them in more of a .cpp/.h paradigm. So
they're out of the question. *Unless* they're upgraded to a
language feature and allow me to define full class declarations
with later implementations of some/all functions.
This also isn't the only use case I have. I'm a game engine
programmer. We write a lot of abstracted interfaces with platform
specific implementations. I know, I know, version(X){} your code,
right? But that's not how everyone works. Some implementations
really do require their own file for maintenance and legal
purposes. But for an example outside of gaming? Take a look at
core.atomic. Two implementations in the same file *AND* a
separate one for documentation purposes. LDC's core.atomic also
has an LLVM definition in there. And if someone writes native ARM
support for DMD, that'll be even more implementations in the same
file. Take note of the duplicated enums and deprecations between
definitions, the alternative to which is to put a version block
inside every function that requires special behaviour. Either way
is not clean, I tells ya.
I'm sure there's many cases where declaration and later
definition is also a perfectly valid programming pattern, and I
don't see at all how these use cases can conflict with D's
forward referencing since it doesn't change referencing rules at
all, it only changes the definition rules.
More information about the Digitalmars-d
mailing list