Proposed improvements to the separate compilation model
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Sat Jul 23 14:54:57 PDT 2011
On 7/23/11 4:01 PM, Vladimir Panteleev wrote:
> On Sat, 23 Jul 2011 23:16:20 +0300, Andrei Alexandrescu
> <SeeWebsiteForEmail at erdani.org> wrote:
>
>> On 7/23/11 1:53 PM, Andrej Mitrovic wrote:
>>> Isn't the biggest issue of large D projects the problems with
>>> incremental compilation (e.g.
>>> https://bitbucket.org/h3r3tic/xfbuild/issue/7/make-incremental-building-reliable),
>>>
>>> optlink, and the toolchain?
>>
>> The proposed improvement would mark a step forward in the toolchain
>> and generally in the development of large programs. In particular, it
>> would provide a simple means to decouple compilation of modules used
>> together. It's not easy for me to figure how people don't get it's a
>> net step forward from the current situation.
>
> Then you don't understand what I'm ranting about.
That's a bit assuming. I thought about it for a little and concluded
that I'd do good to explain the current state of affairs a bit.
Consider:
// file a.di
class A {
int a;
double b;
string c;
void fun();
}
Say the team working on A wants to "freeze" a.di without precluding work
on A.fun(). In a large project, changing a.di would trigger a lot of
recompilations, re-uploads, the need for retests etc. so they'd want to
have control over that. So they freeze a.di and define a.d as follows:
// file a.d
class A {
int a = 42;
double b = 43;
string c = "44";
void fun() { assert(a == 42 && b == 43 && c == "44"); }
}
Now the team has achieved their goal: developers can work on A.fun
without inadvertently messing up a.di. Everybody is happy.
The client code would work like this:
// file main.d
import std.stdio;
import a;
void main() {
auto a = new A;
a.fun();
writeln(a.tupleof);
}
To build and run:
dmd -c a.d
dmd -c main.d a.o
./main
The program prints "424344" as expected.
The problem with this setup is that it's extremely fragile, in ways that
are undetectable during compilation or runtime. For example, just
swapping a and b in the implementation file makes the program print
"08.96566e-31344". Similar issues occur if fields or methods are added
or removed from one file but not the other.
In an attempt to fix this, the developers may add an "import a" to a.d,
thinking that the compiler would import a.di and would verify the bodies
of the two classes for correspondence. That doesn't work - the compiler
simply ignores the import. Things can be tenuously arranged such that
the .d file and the .di file have different names, but in that case the
compiler complains about duplicate definitions.
So the programmers conclude they need to define an interface for A (and
generally each and every hierarchy or isolated class in the project).
But the same problem occurs for struct, and there's no way to define
interfaces for structs.
Ultimately the programmers figure there's no way to keep files separate
without establishing a build mechanism that e.g. generates a.di from
a.d, compares it against the existing a.di, and complains if the two
aren't identical. Upon such a build failure, a senior engineer would
figure out what action to take.
But wait, there's less. The programmers don't have the option of
grouping method implementations in a hierarchy by functionality (which
is common in visitation patterns - even dmd does so). They must define
one class with everything in one place, and there's no way out of that.
My understanding is that the scenarios above are of no value to you, and
if the language would accept them you'd consider that a degradation of
the status quo. Given that the status quo includes a fair amount of
impossible to detect failures and tenuous mechanisms, I disagree. Let me
also play a card I wish I hadn't - I've worked on numerous large
projects and I can tell from direct experience that the inherent
problems are... well, odd. Engineers embarked on such projects need all
the help they could get and would be willing to explore options that
seem ridiculous for projects one fraction the size. Improved .di
generation would be of great help. Enabling other options would be even
better.
> It is certainly an
> improvement, but:
>
> 1) We don't have an infinity of programmer-hours. I'm saying that the
> time would likely be better spent at improving .di generation, which
> should have a much greater overall benefit per required work unit - and
> for all I can tell, you don't even want to seriously consider this option.
Generation of .di files does not compete with the proposed feature.
> 2) Once manually-maintained .di files are usable, they will be used as
> an excuse to shoo away people working on large projects (people
> complaining about compilation speed will be told to just manually write
> .di files for their 100KLoC projects).
Your ability to predict future is much better than mine.
Andrei
More information about the Digitalmars-d
mailing list