What is the compilation model of D?

Wed Jul 25 15:06:49 PDT 2012

On Wed, 25 Jul 2012 21:54:29 +0200
"David Piepgrass" <qwertie256 at gmail.com> wrote:

> Thanks for the very good description, Nick! So if I understand 
> correctly, if
> 
> 1. I use an "auto" return value or suchlike in a module Y.d
> 2. module X.d calls this function
> 3. I call "dmd -c X.d" and "dmd -c Y.d" as separate steps
> 

See, now you're getting into some details that I'm not entirely
familiar with ;)... 

> Then the compiler will have to fully parse Y twice and fully 
> analyze the Y function twice, although it generates object code 
> for the function only once. Right?

That's my understanding of it, yes.

> I wonder how smart it is about 
> not analyzing things it does not need to analyze (e.g. when Y is 
> a big module but X only calls one function from it - the compiler 
> has to parse Y fully but it should avoid most of the semantic 
> analysis.)

I don't know how smart it is about that.

If you have a template that never gets instantiated by *anything*, then
I do know that semantic analysis won't get run on it since
D's templates, like C++ templates (and unlike C#'s generics) can *only*
be evaluated once they're instantiated.  

If, OTOH, you have a plain old function that never gets called, I'm
guessing semantics probably still get run on it.

Anything else: I dunno. :/

> 
> What about templates? In C++ it is a problem that the compiler 
> will instantiate templates repeatedly, say if I use 
> vector<string> in 20 source files, the compiler will generate and 
> store 20 copies of vector<string> (plus 20 copies of 
> basic_string<char>, too) in object files.
> 
> 1. So in D, if I compile the 20 sources separately, does the same 
> thing happen (same collection template instantiated 20 times with 
> all 20 copies stored)?

Again, I'm not certain about this, other people would be able to
answer better, but I *think* it works like this:

If you pass all the files into DMD at once, then it'll only evaluate
and generate code for vector<string> once. If you pass the files in
as separate calls to DMD, then it's do semantic analysis on
vector<string> twenty times, and I have no idea whether code will get
generated one time or twenty times.

> 2. If I compile the 20 sources all together, I guess the template 
> would be instantiated just once, but then which .obj file does 
> the instantiated template go in?
> 

Unless things have been fixed since last I heared, this is actually the
root of the problem with incremental compilation and templates. The
compiler apparently makes some odd, or maybe inconsistent choices about
what obj to stick the template into. I don't know the details of
it though, just that in the past, people attempting to do incremental
compilation have run into occasional linking issues that were traced
back to problems in how DMD handles where to put instantiated
templates. 

> 
> I don't even want to legitimize C++ compiler speed by comparing 
> it to any other language ;)
> 

Fair enough :)

> >> - Is there any concept of an incremental build?
> >
> > Yes, but there's a few "gotcha"s:
> >
> > 1. D compiles so damn fast that it's not nearly as much of an 
> > issue as
> > it is with C++ (which is notoriously ultra-slow compared
> > to...everything, hence the monumental importance of C++'s 
> > incremental
> > builds).
> 
> I figure as CTFE is used more, especially when it is used to 
> decide which template overloads are valid or how a mixin will 
> behave, this will slow down the compiler more and more, thus 
> making incremental builds more important. A typical example would 
> be a compile-time parser-generator, or compiled regexes.
> 

That's probably a fair assumption.

> Plus, I've heard some people complaining that the compiler uses 
> over 1 GB RAM, and splitting up compilation into parts might help 
> with that.
> 

Yea, the problem is, DMD doesn't currently free any of the memory it
takes, so mem usage just grows and grows. That's a known issue that
needs to be taken care of at some point. 

> BTW, I think I heard the compiler uses multithreading to speed up 
> the build, is that right?
> 

Yes, it does. But someone else will have to explain how it actually uses
multithreading, ie, what it multithreads, because I've got no clue ;)
I think it's fairly coarse-grained, like on the module-level, but
that's all I know.

> > It keeps diving deeper and deeper to find anything it can 
> > "start" with.
> > One it finds that, it'll just build everything back up in 
> > whatever
> > order is necessary.
> 
> I hope someone can give more details about this.
> 

I hope so too :)