Incremental compilation with DMD

Tue Sep 15 01:25:48 PDT 2009

Walter Bright wrote:
> What you can try is creating a database that is basically a lib (call it 
> A.lib) of all the modules compiled with -lib. Then recompile all modules 
> that depend on changed modules in one command, also with -lib, call it 
> B.lib. Then for all the obj's in B, replace the corresponding ones in A.

OK, there we go: http://h3.team0xf.com/increBuild2.7z     // I hope it's 
fine to include LIBUNRES here. It's just for convenience.

This is the second incarnation of that incremental build tool 
experiment. This time it uses -lib instead of -multiobj, as suggested by 
Walter.

The algorithm works as follows:

* compile modules to a .lib file
* extract objects with static ctors or the __Dmain function (remove them 
from the lib)
* find out which old object files should be replaced
	* any objects whose any symbols were re-generated in this compilation pass
* pack up the obsoleted object files into a 'junk' library
* prepend the 'junk' library to the /library chain/
* prepend the newly compiled library to the /library chain/
* link the executable by passing the cached object files and the whole 
library chain to the linker

It doesn't use the simple approach of having just one 'junk'/'A.lib' 
library and appending objects to it, because that's pretty slow due to 
the librarian having to re-generate the dictionary at each such 
operation. So instead it keeps a chain of all libraries generated in 
this process and passes them to the linker in the right order. This will 
waste more space than the naive approach, but should be faster.

The archive contains the source code and a compiled binary (DMD-Win only 
for now... Sorry, folks) as well as a little test in the test/ 
directory. It shows how naive incremental compilation fails (break.bat) 
and how this tool works (work.bat).

The tool can be used with the latest Mercurial revision of xfBuild ( 
http://bitbucket.org/h3r3tic/xfbuild/ ) by passing "+cincreBuild" to it. 
The support is a massive hack though, so expect some strangeness.

I was able to run it on the 'Test1' demo of my Hybrid GUI ( 
http://team0xf.com:1024/hybrid/file/c841d95675ca/Test1.d ) and a 
simple/dumb ray tracer based on OMG ( 
http://team0xf.com:1024/omg/file/5199ed783490/Tracer.d ). In incremental 
compilation it's not noticeably slower than the naive approach, however 
DMD consumes more memory in the -lib mode and the executables produced 
by this approach are larger for some reason. For instance, with Hybrid, 
Test1.exe has about 20MB with increBuild, compared to about 5MB with the 
traditional approach. Perhaps there's some simple way to remove this 
bloat, as compressed with UPX even with the fastest compression method 
the executables differ by just a few kilobytes.

When building my second largest project, DMD eats up about 1.2GB of 
memory and dies (even without -g). Luckily, xfBuild allows me to set the 
limit of modules to be compiled at a time, so when I cap it to 200, it 
compiled... but didn't link :( Somewhere in the process a library is 
created that confuses OPTLINK as well as "lib -l". There's one symbol in 
it that neither of these are unable to see and it results in an 
undefined reference when linking. The symbol is clearly there when using 
a lib dumping tool from DDL or "libunres -d -c". I've dropped the lib at 
http://h3.team0xf.com/strangeLib.7z . The symbol in question is 
compressed and this newsgroup probably won't chew the non-ansi chars 
well, but it can be found via a regex "D2xf3omg4core.*ctFromRealVee0P0Z".

One thing slowing this tool down is the need to call the librarian 
multiple times. DMD -lib will sometimes generate multiple objects with 
the same name and you can only extract them (when using the librarian) 
by running lib -x multiple times. DMD should probably be patched up to 
include fully qualified module names in objects instead of just the last 
name (foo.Mod and bar.Mod both yield Mod.obj in the library), as -op 
doesn't seem to help here.

Another idea that will map well onto any incremental builder would be to 
write a tool that will find the differences between modules and tell 
whether e.g. they're limited to function bodies. Then an incremental 
builder could assume that it doesn't have to recompile any dependencies, 
just this one modified file. Unfortunately, this assumption doesn't 
always hold - functions could be used via CTFE to generate code, thus 
the changes escape. Personally I'm of the opinion that functions should 
be explicitly marked for CTFE, and this is just another reason for such. 
I'm using a patched DMD with added pragma(ctfe) which instructs the 
compiler not to run any codegen or generate debug info 
functions/aggregates marked as such. This trick alone can slim an 
executable down by a good megabyte, which sometimes is a life-saver with 
OPTLINK. I've been hearing that other people put their CTFE stuff into 
.di files, but this approach doesn't cover all cases of codegen via CTFE 
and string mixins.

I'm afraid I won't be doing any other prototypes shortly - I really need 
to focus on my master's thesis :P But then, I don't really know how this 
tool can be improved without hacking the compiler or writing custom OMF 
processing.

-- 
Tomasz Stachowiak
http://h3.team0xf.com/
h3/h3r3tic on #D freenode