Another "D is cool" post

Mon May 29 12:07:03 PDT 2017

So, recently in one of my pet projects I have a bit of code that takes a
string, fills in a code template, invokes the D compiler to create a
shared object, then loads the object with dlopen() and calls dlsym() to
get the entry point into the compiled code as a function pointer.

I've written the equivalent code in C before -- and while it worked, it
required a lot of care to ensure things don't blow up, or if they do,
that errors are correctly handled and properly cleaned up. This time
round, though, I found several aspects of D very nice for writing this
sort of code:

1) String-processing: the input string needs some processing before
being put into the code template, so here I get to use nice built-in
facilities like std.array.split. Whereas in the C version, I'd have to
use things like strchr or strstr, carefully make copies of the results
or, if the input string was writable, insert null terminators (which
makes for a poorer API from a design POV), then take care to free
buffers afterwards.  Normal course of the day for writing C code, it's
true, but rather fidgety, error-prone, and, after having written string
manipulation C code for the n'th time, it gets tiring really fast. The
resulting code in D is much simpler (only 4-5 lines; the C equivalent
would probably take about twice that, what with null checking, copying
buffers, etc.), more readable, and more maintainable.

2) Instantiating the code template and invoking the compiler: in D, it
just takes a single call to std.file.write() to write the code template
to a temporary file. Very handy. In C, but I'd have to worry about
dealing with FILE*, handling errors, not forgetting to fclose(), etc..
Again, normal course of the day for writing C code, but just more
fidgety and error-prone.  Invoking the compiler in C involves trickery:
I'd have to explicitly fork-and-exec, which means lots of manual
housekeeping to make sure fork succeeds, keep track of whether I'm in
the parent or child, setting up the arguments to exec*(), manually
handle return codes, etc.. In D, thanks to the awesome API of
std.process (kudos to Lars Kyllingstad, et al, for the excellent revamp
of the original klunky std.process), all I need to do is:

	auto result = execute([
		compiler,
		option1,
		option2,
		srcFile,
		"-of" ~ soFile
	]);
	if (result.status != 0) { /* handle compiler errors */ }

No need to manually fiddle with fork(), dup(), close(), execv(), and all
the paraphrenalia associated with them.

3) Even better yet, std.process.execute captures compiler output
automatically. In C, if I want to do that, I'd have to deal with
manually redirecting stdout/stderr, set up a pipe for copying the output
to a buffer, manage the buffer, etc.. Again, nothing out of the ordinary
for a C coder but it's just a lot of fussy housekeeping that reduces
productivity.  In D, std.process handles it all nicely for me -- plus it
works on Windows too, should I ever desire to compile the code on
Windows.  In C, I'd have to rewrite the entire code to use Windows API
calls instead of Posix calls -- a major undertaking.  (Of course,
replacing dlopen & co with Windows calls is a different story -- perhaps
Phobos should have a new module for handling this!)

4) As it turns out, I *did* need to capture the compiler's output
eventually.  Here's where some of D's features made things highly
productive: the compiled code template declares a public symbol that the
main program needs to know, so that it know what symbol to look up with
dlsym() when loading the resulting shared object.  I could easily
hardcode the mangling of this symbol, since I control what goes in the
code template.  However, it's ugly, and not future-proof: if D's
mangling scheme were to change in the future, the code would break.
Plus, it's just ugly to have to manually write out a mangled symbol when
the compiler can already do it for you.

The tricky part, though, is that .mangleof only works on an identifier
defined in the *current* program; the compiler can't do it for a symbol
in a string that's to be passed at runtime to another invocation of the
compiler.  And AFAIK, there's currently no way to ask the compiler "what
would be the mangling of mymodule.symbol?" if 'mymodule' and 'symbol'
only exist in the shared object, not in the main program.

So the solution is to insert a `pragma(msg, symbol.mangleof)` in the
code template, and have the main program parse the output of the
compiler when it compiles the shared object, so that it learns the
mangled identifier directly.  If this were C++ code, I'd be stuck up the
creek without a paddle: hardcoding the mangled symbol would be
unavoidable. But thanks to pragma(msg) and .mangleof in D, this was a
cinch.  Not to mention that std.process.execute already captures output
for me automatically, so all I need to do is to search for the symbol in
the captured output.

(Granted, in C there'd be no need for this dance, since I could just use
the (unmangled) symbol directly. But still, it's nice that D provides
the facilities for dealing with identifiers easily even in the face of
mangling schemes.)

5) Exceptions in Phobos are not a bad thing: they help greatly
streamline the code that interacts with the OS: std.process.execute,
std.file.write, etc.. I can just write the steps sequentially without
the verbosity of C's `if ((err = syscall(...)) != 0) goto CLEANUP;`
dance; exceptions take care of handling error conditions for me without
uglifying the code.

All in all, I was able to write all of this code in just few hours'
worth of coding and unittesting, whereas the C equivalent took me
several days.  Hooray for D!

T

-- 
Customer support: the art of getting your clients to pay for your own incompetence.