What is the compilation model of D?

Tue Jul 24 19:00:56 PDT 2012

On Wed, 25 Jul 2012 02:16:04 +0200
"David Piepgrass" <qwertie256 at gmail.com> wrote:

> (Maybe this should be in D.learn but it's a somewhat advanced 
> topic)
> 
> I would really like to understand how D compiles a program or 
> library. I looked through TDPL and it doesn't seem to say 
> anything about how compilation works.
> 

The compilation model is very similar to C or C++, so that's a good
starting point for understanding how D's works.

Here's how it works:

Whatever file *or files* you pass to DMD on the command line, *those*
are the files it will compile and generate object files for. No more,
no less.

However, in the process, it will *also* parse and perform semantic
analysis on any files that are directly or indirectly imported, but it
won't actually generate any machine code or object files for them (it
will find these files  via the -Ipath command line switch you pass
to DMD - this -I switch is like D's equivalent of Java's classpaths).

This does mean that, unlike what's typically done in C/C++, it's
generally much faster to pass all your files into DMD at once, instead
of the typical C/C++ route of making separate calls to the compiler for
each source file.

After DMD generates the object files for all source files you give it,
it will automatically send them to the linker (OPTLINK on windows, or
gcc/ld on Posix) to be linked into an executable. That is, *unless* you
give it either -c ("compile-only, do not link") or -lib ("generate
library instead of object files"). That way, you can link manually if
you wish.

So typically, you pass DMD all the .d files in your program, and it'll
compile them all, and pass them to the linker to be linked into an
executable. But if you don't want to automatically link, you don't have
to. If you want to compile them all separately, you can do so (though
it'd be very slow - probably almost as slow as C++, but not quite).

But that's just the DMD compiler itself. Instead of using DMD
directly, there's a better modern trick that's generally preferred:
RDMD.

If you use rdmd to compile (instead of dmd), you *just* give it
your *one* main source file (typically the one with your "main()"
function). This file must be the *last* parameter passed to rdmd:

$rdmd --build-only (any other flags) main.d

Then, RDMD will figure out *all* of the source files needed (using
the full compiler's frontend, so it never gets fooled into missing
anything), and if any of them have been changed, it will automatically
pass them *all* into DMD for you. This way, you don't have to
manually keep track of all your files and pass them all into
DMD youself. Just give RDMD your main file and that's it, you're golden.

Side note: Another little trick with RDMD: Omit the --build-only and
it will compile AND then run your program:

$cat simpleecho.d
import std.stdio;
void main(string[] args)
{
	writeln(args[1]);
}

$rdmd simpleecho.d "Anything after the .d file is passed to your app"
{automatically compiles all sources if needed}
Anything after the .d file is passed to your app

$wheee!!
command not found

> - Does it compile all source files in a project at once?

Answered this above. In short: It compiles whatever you give it (and
processes, but doesn't compile, any needed imports). Unless you use RDMD
in which case it automatically detects and compiles all your
needed sources (unless none of them have changed).

> - Does the compiler it have to re-parse all Phobos templates (in 
> modules used by the program) whenever it starts?

Yes. (Unless you never import anything from in phobos...I think.) But
it's very, very fast to parse. Lightning-speed if you compare it to C++.

But it shouldn't run full semantic analysis on templates that are never
actually used. (Unless they're used in a piece of dead code.)

> - Is there any concept of an incremental build?

Yes, but there's a few "gotcha"s:

1. D compiles so damn fast that it's not nearly as much of an issue as
it is with C++ (which is notoriously ultra-slow compared
to...everything, hence the monumental importance of C++'s incremental
builds).

2. Historically, there can be problems with templates when
incrementally compiling. DMD has been known to get confused about which
object file it put an instantiated template into, which can lead to
occasional linker errors. These errors can be fixed by doing a full
rebuild (which is WAAAY faster than it would be with C++). I don't know
whether or not this has been fixed.

3. Incremental building typically involves compiling files
one-at-a-time. But with D, you get a HUGE boost in compilation speed by
not compiling one-at-a-time. So if you have a huge, slow-to-compile
codebase (for example, 15 seconds or so), and you change a handful of
files, it may actually be much *faster* to do a full rebuild (since
you're not re-analysing all the imports). Of course, you could probably
get around that issue by passing all the changed files (and only the
changed files) into DMD at once (instead of one-at-a-time), but I don't
know whether typical build tools (like make) can realistically handle
that.

> - Obviously, one can set up circular dependencies in which the 
> compile-time meaning of some code in module A depends on the 
> meaning of some code in module B, which in turn depends on the 
> meaning of some other code in module A. Sometimes the D compiler 
> can resolve the ultimate meaning, other times it cannot. I was 
> pleased that the compiler successfully understood this:
> 
> // Y.d
> import X;
> struct StructY {
> 	int a = StructX().c;
> 	auto b() { return StructX().d(); }
> }
> 
> // X.d
> import Y;
> struct StructX {
> 	int c = 3;
> 	auto d()
> 	{
> 		static if (StructY().a == 3 && StructY().a.sizeof ==
> 3) return 3;
> 		else
> 			return "C";
> 	}
> }
> 
> But what procedure does the compiler use to resolve the semantics 
> of the code? Is there a specification anywhere? Does it have some 
> limitations, such that there is code with an unambiguous meaning 
> that a human could resolve but the compiler cannot?
> 

It keeps diving deeper and deeper to find anything it can "start" with.
One it finds that, it'll just build everything back up in whatever
order is necessary.

If it *truly is* a circular definition, and there isn't any place
it can actually start with, then it issues an error.

(If there's any cases where it doesn't work this way, they should be
filed as bugs in the compiler.)

> - In light of the above (that the meaning of D code can be 
> interdependent with other D code, plus the presence of mixins and 
> all that), what are the limitations of __traits(allMembers...) 
> and other compile-time reflection operations, and what kind of 
> problems might a user expect to encounter?

Shouldn't really be an issue. Such things won't get evaluated until the
types/identifiers involved are *fully* analyzed (or at least to the
extent that they need to be analyzed). So the results of things like
__traits(allMembers...) should *never* change during compilation, or
when changing the order of files or imports (unless there's some
compiler bug). Any situation that *would* result in any such ambiguity
will get flagged as an error in your code.

I would however, recommend avoiding static constructors and module
constructors whenever you reasonably can. If you have a circular
import (ie: module a imports b, which imports c, which imports
a), then that's normally OK, *UNLESS* they all have static
and/or module constructors. If they do, then the startup code D builds
into your application won't know which needs to run first (and it
doesn't analyze the actual code, it just assumes there *could* be
an order-of-execution dependency), so you'll get a circular dependency
error when you run your program. And the safest, easiest way to get rid
of those errors is to eliminate one or more static/module constructors.