Proposed improvements to the separate compilation model

Sat Jul 23 14:54:57 PDT 2011

On 7/23/11 4:01 PM, Vladimir Panteleev wrote:
> On Sat, 23 Jul 2011 23:16:20 +0300, Andrei Alexandrescu
> <SeeWebsiteForEmail at erdani.org> wrote:
>
>> On 7/23/11 1:53 PM, Andrej Mitrovic wrote:
>>> Isn't the biggest issue of large D projects the problems with
>>> incremental compilation (e.g.
>>> https://bitbucket.org/h3r3tic/xfbuild/issue/7/make-incremental-building-reliable),
>>>
>>> optlink, and the toolchain?
>>
>> The proposed improvement would mark a step forward in the toolchain
>> and generally in the development of large programs. In particular, it
>> would provide a simple means to decouple compilation of modules used
>> together. It's not easy for me to figure how people don't get it's a
>> net step forward from the current situation.
>
> Then you don't understand what I'm ranting about.

That's a bit assuming. I thought about it for a little and concluded 
that I'd do good to explain the current state of affairs a bit.

Consider:

// file a.di
class A {
     int a;
     double b;
     string c;
     void fun();
}

Say the team working on A wants to "freeze" a.di without precluding work 
on A.fun(). In a large project, changing a.di would trigger a lot of 
recompilations, re-uploads, the need for retests etc. so they'd want to 
have control over that. So they freeze a.di and define a.d as follows:

// file a.d
class A {
     int a = 42;
     double b = 43;
     string c = "44";
     void fun() { assert(a == 42 && b == 43 && c == "44"); }
}

Now the team has achieved their goal: developers can work on A.fun 
without inadvertently messing up a.di. Everybody is happy.

The client code would work like this:

// file main.d
import std.stdio;
import a;

void main() {
     auto a = new A;
     a.fun();
     writeln(a.tupleof);
}

To build and run:

dmd -c a.d
dmd -c main.d a.o
./main

The program prints "424344" as expected.

The problem with this setup is that it's extremely fragile, in ways that 
are undetectable during compilation or runtime. For example, just 
swapping a and b in the implementation file makes the program print
"08.96566e-31344". Similar issues occur if fields or methods are added 
or removed from one file but not the other.

In an attempt to fix this, the developers may add an "import a" to a.d, 
thinking that the compiler would import a.di and would verify the bodies 
of the two classes for correspondence. That doesn't work - the compiler 
simply ignores the import. Things can be tenuously arranged such that 
the .d file and the .di file have different names, but in that case the 
compiler complains about duplicate definitions.

So the programmers conclude they need to define an interface for A (and 
generally each and every hierarchy or isolated class in the project). 
But the same problem occurs for struct, and there's no way to define 
interfaces for structs.

Ultimately the programmers figure there's no way to keep files separate 
without establishing a build mechanism that e.g. generates a.di from 
a.d, compares it against the existing a.di, and complains if the two 
aren't identical. Upon such a build failure, a senior engineer would 
figure out what action to take.

But wait, there's less. The programmers don't have the option of 
grouping method implementations in a hierarchy by functionality (which 
is common in visitation patterns - even dmd does so). They must define 
one class with everything in one place, and there's no way out of that.

My understanding is that the scenarios above are of no value to you, and 
if the language would accept them you'd consider that a degradation of 
the status quo. Given that the status quo includes a fair amount of 
impossible to detect failures and tenuous mechanisms, I disagree. Let me 
also play a card I wish I hadn't - I've worked on numerous large 
projects and I can tell from direct experience that the inherent 
problems are... well, odd. Engineers embarked on such projects need all 
the help they could get and would be willing to explore options that 
seem ridiculous for projects one fraction the size. Improved .di 
generation would be of great help. Enabling other options would be even 
better.

> It is certainly an
> improvement, but:
>
> 1) We don't have an infinity of programmer-hours. I'm saying that the
> time would likely be better spent at improving .di generation, which
> should have a much greater overall benefit per required work unit - and
> for all I can tell, you don't even want to seriously consider this option.

Generation of .di files does not compete with the proposed feature.

> 2) Once manually-maintained .di files are usable, they will be used as
> an excuse to shoo away people working on large projects (people
> complaining about compilation speed will be told to just manually write
> .di files for their 100KLoC projects).

Your ability to predict future is much better than mine.

Andrei