Dynamic Code in D

Sat Jan 12 09:41:40 PST 2008

I'm responsible for an existing Java application that is the core
product of my company. However it's essentially a productised prototype
and needs a rewrite, or at least some serious refactoring, sometime
this year. I am considering doing this in D; I like the language design
and there are potential performance and maintainability benefits. As it
turns out, this may entail creating a new D library. Thus I'd like to
ask the following questions;

1) Is this practical at all?
2) Is there a better way to do this?
3) Would anyone other than us want to use this library if we made it
(i.e. is it worth open-sourcing).

The system carries out data mining tasks using algorithms generated by a
genetic programming engine. The original prototype (developed before I
joined the company) did this by construcing a Lisp-style function tree
out of Java objects, then running that in an interpreter (i.e. it used
ECJ). Unsurprisingly this was hideously slow. The current version that
I designed works by compiling the function tree to Java bytecode (as
static methods in a new class), calling an in-memory classloader on the
output (allowing Java to link it), then allowing Hotspot to translate
it to native machine code. However to get good performance on large
datasets, I had to implement the terminal functions (i.e. the inner
loops) in C, using GCC's SSE intrinsics. The dynamic java code calls
this via the Java Native Interface (specifically, a class full of static
native methods). However JNI has a significant call overhead, which is
eating up a good fraction of the performance boost, as well as creating
additional development/maintenance hassle. This system is running on a
couple of beefy 16 core Opteron servers and it still takes hours to run
on large datasets.

The major problem with implementing in D is the lack of runtime
compliation and the AFAIK subpar support for dynamic linking. I'll get
to that in a bit. The other potential problems are;

1. Support for AMD64 on Linux is absolutely essential; these datasets
are several gigabytes in size and need to be loaded into memory. Plus
there is a nontrivial amount of 64-bit integer manipulation in the Java
code. Linux obviously because what sane person would waste money on
Windows Server licenses for compute servers? I haven't been able to
track down the latest estimate for when DMD will support AMD64, but at
least GDC does now. Is this stable enough for production use?

2. High-performance synchronization. As of 1.6, Java's monitor
implementation is quite impressive; the biased locking mechanism means
that atomic operations are only required for a small fraction of monitor
acquisitions. I've been trying to find out if D's implementation uses
the same mechanism. This is probably only good for a few percent of
performance, but on 16-way NUMA locking performance can make a
difference even when it isn't in the inner loops.

3. SSE intrinsics. Does GDC have the equivalent of GCC SSE intrinsics
yet (i.e. "target builtins")? This post suggests that it does;
http://www.digitalmars.com/pnews/read.php?server=news.digitalmars.com&group=D.gnu&artnum=2622
but I can't see anything in the documentation about them. If not I'll
have to continue linking to the existing C code, which eliminates a
significant benefit of going to D (simplifying the code base - but at
least the JNI call & pinning overhead will be gone). Ideally
std.intrinsic would have these (presumably that's how DMD is going to
include them, if an when it does - as far as I can tell, DMD is still
doing even basic double-precision operations with slow x87 code rather
than SSE code, unless
http://www.digitalmars.com/d/archives/digitalmars/D/Optimizations_of_DMD_49840.html
is out of date).

Ok, on to the serious problem, dynamic code support. The GP engine is
outputting a function tree structure. I can trivially output this as D
or Java source. I can output it as Java bytecode with a bit more effort.
I really don't want to have to compile it to x86 machine code (and link
it against the terminal functions) myself, particularly given that the
research guys keep elaborating the algorithm. Here's what I've
considered so far;

1. Output as C source plus a JNI stub in bytecode. Invoke gcc on it to
build a dynamic library. Load the class in Java. This would remove most
of the JNI overhead (because the Java/C crossover would be pushed 4 to
10 places up the call graph). However I'd have to keep invoking GCC as
a seperate process, though I could cut down the overhead of this
somewhat by putting it and the dynamic files on a ramdisk. I'm unsure
what the performance would be like under Linux of constantly linking
new shared libraries, and I'm not sure if the Sun JVM is capable of
unloading native code along with unloading classes (though this memory
leak is probably insignificant over the lifetime of the process). This
is the solution I will be using if I can't find a good D solution.

2. Port to D. Change the GP system to target MiniD bytecode. I haven't
examined MiniD in detail yet, but I get the impression that this will
be marginally harder than targetting Java bytecode. Alternatively I
could target MiniD source, which would be simpler for me, at some
additional compilation cost, or I could dig into the MiniD compiler
source and see if it has an AST I could target. The later would be
ideal if the code is stable, but it probably isn't.

MiniD is purely interpreted, so performance of the dynamic code will
almost certainly suck badly compared to Hotspot. On the plus side,
'linking' cost is effectively zero. The inner loops will still have
to be in either D (if it has decent SSE intrinsics now) or C (if it
doesn't); at first glance MiniD native invokation overhead looks better
than JNI (no need to pin for one thing, since the GC is non-copying)
but still nontrivial.

3) Port to D. Change the GP system to target D source. Batch code
updates together. On each update, recompile the codebase using gdc
running in a ramdisk. Start a new process using the new code. Keep
all the persistent data in a shared memory segment. Get the new
process to map in the shared segment, then kill the old process.
Processing can continue from where it left off using the updated code.

AFAIK this /should/ work, but I don't know what the overheads are
going to be like. Having to batch together the code updates is
mildly inconvenient from the algorithm design side. I'm not sure what
if any incremental compilation features GDC has; perhaps I could path
it to stay in memory and stream new classes to it over a pipe.

4) Port to D. Change the GP system to target D source. Batch code
updated together. On each update, link the new classes in with DDL.

The major problem with this is that AFAIK DDL isn't working on Linux
yet (at all, never mind on AMD64) and is still beta quality code
even on Windows. In addition to that, I'm not clear if it can ever
recover the memory for unused code, though as I mentioned for (1) this
leak isn't likely to be a problem over the lifetime of the process.
Plus the GDC invokation overhead as for (3).

and finally...

6) Write a new 'D Dynamic Code Library'. My best guess for how this
should work is this;

a) This library will statically link to GDC and the GCC back end.
b) On first invokation, look for a metadata file. If it isn't present or
is out of date, recompile the app from source and keep the relevant
GDC state in memory. Otherwise, deserialise GDC state from the metadata
file.
c) Supply functions that accept D source and possibly one or more of the
GCC intermediate representations (i.e. AST, CFG, RTL). These would call
through to GDC/GCC, intercept the output and write it into the code
page, then perform runtime linking as necessary. This is probably a lot
of work, but I am unsure exactly how much.
d) Open source this library in case anyone else wants to use it.

Any thoughts on any of the above? Bear in mind that I have a mild bias
towards using D even if it is some extra work to get things going, but I
am certainly not going to accept worse overall performance than the Java
version.

Michael Wilson
Systems Developer
Purple Frog Text Ltd