Dynamic Code in D

Sat Jan 12 12:12:39 PST 2008

bearophile wrote:
> D is a young language, it has lot of rough corners, so you have to forget
 > the high level of polish that Java systems have today.

I know; Java was awful when I first started using it in 1997 (not by 
choice, at the time), but these days it's actually very good for
large reasonably-long-running applications. However while the runtime
system is very good and the standard library is decent, the language
itself is kinda gnarly and obviously restrictive compared to C (thus
the use of JNI in this app).

> I like D, and hopefully more and more people will start using it, but I
 > think today it's not much fit to replace "serious" Java programs.

For this app, there are three types of server, the web cluster, the DB
cluster and the compute cluster. The web cluster has all the interface
and application logic, and that code is definitely staying in Java
(+Javascript on the client side). All the persistent data stays in
on the file/DB servers, in Postgres and some big binary files. The
compute servers are the only ones I am considering moving to D. If it
wasn't for the SSE I probably wouldn't, but that JNI overhead is
seriously annoying. Plus I can do low-level things such as NUMA
optimisation from D much more easily than I can from Java.

> HotSpot inlines methods, and it has a really good GC, while DMD
 > currently seem to not inline them and its GC is snail-slow compared
 > to the refined HotSpot (not the server one) one.

I'm not concerned about GC since there isn't much temporary object
usage; the bulk of the data is in a relatively small number of huge
arrays (which I may well go to manual memory management for in D
anyway). No method inlining is a little more serious, but then I'll
have to use GDC rather than DMD for the 64-bit support, and
presumably GDC inlines methods ok? If not I could make the GP
output stage do it, the call depth and function size is low enough
that it shouldn't be difficult.

> For that purpose CommonLisp looks like the a good language (beside
 > Java itself). It's fast enough if used well, and it compiles the
 > functions on the fly. There are some disadvantages, like it's not
 > easy to find CLisp programmers.

It's even harder to find D developers, but it's easy for developers
who know C/C++/C#/Java to learn D (hopefully! since I'm learning it
and if I used it on a production system other developers at the
company would have to as well). Lisp, not so much. I guarentee that
if I said 'I want to reimplement in Lisp' the answer would be no,
while I received provisional approval for using D on the basis that
'it's very similar to Java, our developers can learn it easily'.

Aside from that the other problems with using Lisp in this app are;
1) /Much/ more work to port the sections of code I want to reuse
from the old system across.
2) I don't know of any CL implementations with SSE intrinsics. So
unless someone else knows one I'd be stuck linking to C code again.
It's possible that there's a CL implementation with less overhead
for linking dynamic code to external C libraries than JNI has, but
if so I don't know of it.
3) AFAIK similar issues with trying to do NUMA optimisation as Java
and worse synchronisation performance (though again, there may be
some super-optimal CL implementation out there I don't know about).

> How much time does a single fitness computation take on average?
 > If it's really little time (less than 30-40 seconds) then
 > compiling/interfacing timings become important.

This depends on the dataset size. For a typical dataset, on each pass
the GP engine generates a few thousand classifiers based on a few
hundred (dynamic) component functions. Computing the fitness of those
takes somewhere between 10 seconds and a minute, on a server with
eight Opteron 8218s. This is heavily optimised to reuse intermediate
values where possible, which is where the moderate amount of thread
synchronisation comes from (and why one big server outperforms lots
of small servers of the same price and more aggregate compute
performance - though we're working on that). Each run makes several
hundred of these passes.

So the compiler is invoked a few times a minute on the equivalent
of a couple of thousand lines of code. Compiler speed shouldn't be
a major factor.

> Often the bigger gains in such evolutionary programs come from
 > "improving" the search space, adding more heuristics, etc. Not
 > from going from Java to D.

Absolutely, but I'm not responsible for that (though I certainly
make suggestions). My job is to make the algorithms the research
team comes up with run on commercial datasets (my team as a whole
is responsible for 'productisation').

>> 2. High-performance synchronization. As of 1.6, Java's monitor
 >> implementation is quite impressive;<
> 
> I don't know the answer, but in most things D is far from being tuned
 > and refined, etc. In that kind of things only C# may be better.

That's something I could potentially help with once I'm decently
familiar with the language, if the GDC developers are accepting patches.
Otherwise, I could just implement my own synchronization, via
inline assembly (since this doesn't have to be multi-platform) or C.

>> 3. SSE intrinsics. Does GDC have the equivalent of GCC SSE intrinsics yet
> 
> Nope, I think.

Clearly they were scheduled to go in last year (from the post I linked
to); has there been a disruption to GDC development recently?

> If the running time of that fitness function is very little you may need to
 > find something faster than GCC, like TinyCC (but I think that's not your
 > situation, and most probably your fitness functions are small, so 
they are
 > very quick to compile).

Interesting suggestion but unfotuantely TinyCC does not (AFAIK) target
AMD64 - also a problem for a lot of CL implementations. I'll have a look
around for similar projects that do though.

> Look for other alternative languages/solutions too, like CLisp, Java TinyCC 
 > compiled code, Python+Psyco with Cython compiled code on the fly, etc.

I am doing so; D is one of them. :) The very first thing I tried was
Excelsior JET, a mixed mode compiler for Java (static where possible,
JIT for new code loaded at runtime) that I'd had good results with in
the past. Unfortunately performance dropped significantly, even when I
'cheated', saved a log of all GP functions generated in a run and
supplied them to JET to statically compile for the benchmark run. As
you noted, Hotspot is really quite good these days.

> I think you can answer most of your questions about D doing some small
 > benchmarks for 1-2 days only. Maybe you will like the language but not
 > its performance for your purposes :-)

Yep; I'll probably start with the 'multiple processes passing data via
shared memory' solution, as that sounds like the simplest to implement.
If and when I've got that working as a proof of concept, I'll take a
look at LLVM.

Paolo Invernizzi wrote:
 > I guess the best approach it's LLVM...
 > Take a look at http://www.llvm.org
 > Cheers, Paolo

Thanks! That does look really interesting. I had a quick look at
GNU Lightning but it's almost entirely unoptimising, and thus probably
unusable. LLVM looks much better at first glance and I should be able
to target LLVM bytecode directly, but I'll have to investigate the 
feasibility of linking against D. Googling, it looks like there have
been a few attempts to use LLVM and D together before, but mostly in
the sense of using it as a third static compiler (e.g.
http://www.dsource.org/projects/llvmdc). I will have to do some more
research and see if there is anything I could get involved with.
Needless to say, if I can get my company to contribute engineering
time to some interesting open source development, I will. :)

Robert Fraser wrote;
 > Wow; that's a cool project and as much of it as you're willing to
 > open-source would be awesome, even if it mainly serves as an
 > educational tool.

I'm pretty sure the core algorithms will never be open sourced, but
something like a dynamic code library definitely would be OSed, and
I'm hopefully that the whole GP engine might eventually be OSed.

CptJack wrote:
 > Everything you need to do is provided by Common Lisp, and I don't mean
 > "CLisp", which is a byte-code interpreted CL implementation. Contrary
 > to what most people assume, almost all CL implementations compile
 > functions on-the-fly to machine code. That's raw machine object code,
 > not byte-code, compiled and loaded dynamically while the system is
 > still running. This sounds exactly like what you want.

That sounds exactly like what the Java version is already doing. The
only obvious benefit with Lisp is a slight simplification of the GP
engine output stage. As I've noted, Hotspot is very well optimised
these days, I'm not aware of a CL implementation that does better and
indeed most perform significantly worse. D would have the advantage of
being able to link to C with effectively zero overhead, do manual memory
allocation and synchronization where beneficial (i.e. NUMA-optimised
memory allocation that still plays well with the GC) and be able to use
inline assembly for SSE code. CL doesn't have any of those advantages
AFAIK, plus as I mentioned earlier it's (much) harder to get new
developers for and (significantly) harder to train existing developers
in. I'm actually genuinely surprised that the research team chose to
develop the original algorithms in Java rather than Lisp; it would
have been a good fit with that task, plus they're ex-academics and
thus inherently more likely to use Lisp. But I don't think that it's
a good choice for the production system, particularly given that the
current code is in Java.

Michael Wilson
Systems Developer
Purple Frog Text Ltd