Unittests and extern(C)

H. S. Teoh via Digitalmars-d digitalmars-d at puremagic.com
Wed Jun 21 15:19:48 PDT 2017


Never thought to mention these two things in the same subject line?
Haha, well today I finally have reason to. This post is about an obscure
bug I encountered today in one of my projects, with a moral lesson on
why you really, really, ought to be using unittest blocks everywhere.

First, a bit of a background.  The program in which said bug occurred
consists of a module that takes user input, preprocesses it, and
instantiates a code template that produces a D code snippet. This
snippet is then saved into a temporary file, and compiled with the local
D compiler to produce a shared object. Subsequently, it makes use of
Posix's dlopen() family of functions to load the shared object, lookup
the symbol of the generated function, and return a function pointer to
it.  The main module then does its own processing in which it calls the
generated code via this function pointer.

The actual code is, of course, rather involved, but here's a
highly-simplified version of it that captures the essentials:

	// The code template.
	//
	// The q"..." syntax is D's built-in heredoc syntax, convenient
	// for multi-line string literals.
	//
	// Basically, this template is just a boilerplate function
	// declaration to wrap around the generated D code snippet. It's
	// written as a format string, with the "%s" specifying where
	// the code snippet should be inserted.
	static immutable string codeTemplate = q"ENDTEMPLATE
	module funcImpl;
	double funcImpl(double x, double y)
	{
		return %s;
	}
	ENDTEMPLATE";

	// A convenient alias for the function pointer type that will be
	// returned by the JIT compiler.
	alias FuncImpl = double function(double, double);

	// Compiles the given input into a shared object, load it, and
	// return a function pointer to its entry point.
	FuncImpl compile(string preprocessedInput)
	{
		// Instantiate code template and write it into a
		// temporary source file.
		import std.format : format;
		string code = format(codeTemplate, preprocessedInput);

		import std.file : write;
		enum srcFile = "/path/to/tmpfile.d";
		write(srcFile, code);

		// Compile it into a shared object with the D compiler.
		// Thanks to the wonderful API of std.process, this is a
		// cinch, no need to fiddle with fork(), execv(),
		// waitpid(), etc..
		import std.process;
		enum soFile = "/path/to/tmpfile.so";
		auto ret = execute([
			"/path/to/dmd",
			"-fPIC",
			"-shared",	// make it a shared object
			"-of" ~ soFile,
			srcFile
		]);
		if (ret.status != 0) ... // compile failed

		// Load the result as a shared library
		import core.sys.posix.dlfcn;
		import std.string : toStringz;

		void* lib = dlopen(soFile.toStringz, RTLD_LAZY | RTLD_LOCAL);
		if (lib is null) ... // handle error

		// Lookup the symbol of the generated function
		auto symbol = "_D8funcImpl8funcImplFddZd"; // mangled name of funcImpl()
		impl = cast(FuncImpl) dlsym(lib, symbol);
		if (impl is null) ... // handle error
		return impl;
	}

	void main(string[] args)
	{
		auto input = getUserInput(...);
		auto snippet = preprocessInput(input);
		auto impl = compile(snippet);

		... // do stuff
		auto result = impl(x, y); // call generated function
		... // more stuff
	}

The symbol "_D8funcImpl8funcImplFddZd" is basically the mangled version
of funcImpl(). A mangled name is basically a way of encoding a function
signature into a legal identifier for an object file symbol -- the
system linker does not (and should not) understand D overloading rules,
for example, so the compiler needs to generate a unique name for every
function overload. Generally, the D compiler takes care of this for us,
so we never have to worry about it in usual D code, and can simply use
the human-readable name "funcImpl", or the fully-qualified name
"funcImpl.funcImpl". However, since dlsym() doesn't understand the D
mangling scheme, in this case we need to tell it the mangled name so
that it can find the right symbol in the shared object.

This was the original version of the program. So far so good.

In this version of the program, I didn't write many unittests for
compile(), because I felt it was ugly for unittests to have side-effects
on the host system (creating / deleting files, running external
programs, etc.). So, to my shame, this part of the code had rather poor
unittest coverage. Which will eventually lead to trouble...

...

Then recently, I rewrote preprocessInput to do smarter things with the
user input.  In the course of doing that, I decided to clean up the
above bit of code in compile() that deals with mangled function names.
Since the code template doesn't actually need to do any overloading, we
don't actually have to deal with mangled names at all; we could just use
D's handy "extern(C)" declaration to tell the compiler to use C-style
function names (i.e., no mangling, no overloading) for the function
instead of the usual D mangling scheme. Then we don't have to work with
the ugly unreadable mangled name and can just ask dlsym() directly for
the symbol "funcImpl".

So all we need is to add "extern(C)" to the code template:

	static immutable string codeTemplate = q"ENDTEMPLATE
	module funcImpl;
	extern(C) double funcImpl(double x, double y)
	{
		return %s;
	}
	ENDTEMPLATE";

Then change the declaration of `symbol` to just:

	auto symbol = "funcImpl"; // much more readable!

Should be a harmless change, right? Right...?

Well, initially, I didn't notice anything amiss.  No thanks to compile()
not having sufficient unittest coverage, e.g., run the compilation code
on a trivial code fragment like "y - x*x" then checking that calling
impl() returns the expected answer, I didn't notice at first that the
program output has now gone horribly wrong.  As luck would have it, at
the time I just so happened to be testing inputs whose corresponding
functions were symmetrical with respect to the arguments x and y, i.e.,
funcImpl(x,y) == funcImpl(y,x).  No errors were found in the output.

But later when I ran it on something that *wasn't* symmetrical, I
suddenly noticed that the results were all wrong.  It seemed that x and
y were getting swapped for some reason.  My first guess was that the new
preprocessing code had some bugs in it.  So I ran git bisect to find the
error... unfortunately, I had done the rewrite of the preprocessing code
in a separate module, which wasn't integrated with the main program
until the very end.  So when git bisect indicated that the bug first
appeared in the commit that first connected the new code to the main
program (the extern(C) change was made as part of this commit, since it
was such a small change) , it still didn't answer the question of where
exactly the bug was.  I checked and double-checked the code to make sure
that I didn't screw up somewhere and exchanged x and y... but found
nothing.

Long story short, eventually I found that the offending change was when
I added "extern(C)" to codeTemplate. And then, it all became instantly
clear:

The code template says:

	extern(C) double funcImpl(double x, double y)

But the function pointer type is declared as:

	alias FuncImpl = double function(double, double);

Notice the lack of `extern(C)` in the latter. The call to dlsym(), of
course, simply casts the return value to FuncImpl, because dlsym()
doesn't know anything about what the symbol's type might be, and just
returns void*:

	FuncImpl impl = cast(FuncImpl) dlsym(lib, symbol);

Here's a lesson I learned that I didn't think about before: since
extern(C) is intended for C interoperability, it not only suppresses D
symbol mangling, but also *changes the ABI* of the function to be
compatible with whatever the host OS's C calling conventions are.  In
particular, the C calling convention may or may not be the same as the D
calling convention. I had assumed that they were more-or-less the same
as long as you weren't passing D-specific things between the functions,
but in this case, I was wrong: the *order* of parameters when calling a
D function is the *opposite* of the order of parameters when calling a C
function, and naturally this applied to all extern(C) functions,
including those written in D but annotated with extern(C).

As a result, when the main program invoked impl(x,y), the reversed
parameter order of extern(C) caused the effective call to become
impl(y,x), even though nowhere in the main program are x and y ever
swapped!

After realizing this, the fix is trivial: add `extern(C)` to the
function pointer type:

	alias FuncImpl = extern(C) double function(double, double);

And everything worked as it should again.

Finally, the moral of the story is that had I written unittests for
compile(), I would have caught this bug much earlier than I did. It's as
simple as writing:

	unittest
	{
		auto impl = compile("y - x*x");
		assert(impl(2.0, 4.0) == 0.0);
	}

Then when I forgot to add extern(C) to the declaration of FuncImpl, the
unittest failure would have pinpointed the problem immediately and saved
me the trouble of git bisecting and inspecting the code to find the bug.
Needless to say, this unittest is now in my codebase -- who cares if
it's ugly that the unittest writes to the filesystem and invokes an
external program; the benefit of catching obscure bugs like this one
(and preventing such bugs from resurfacing later) far outweighs any
ivory tower idealistic notions about unittests having no side-effects.

TL;DR: (1) extern(C) not only suppresses symbol mangling, it also
changes the ABI of the annotated function; (2) always, *always*, write
unittests.  Just do it. Even supposedly "ugly" or "dirty" ones. Compile
with -cov to shame yourself into writing more unittests. (My compile()
function had at one point 0% test coverage. That should have raised a
big red flag.)


P.S. @MikeParker: this time I wrote this post with the D blog in mind. I
know it probably needs more polish, but hopefully this is something you
can work with? ;-)


T

-- 
Windows: the ultimate triumph of marketing over technology. -- Adrian von Bidder


More information about the Digitalmars-d mailing list