Possible solution to template bloat problem?

Mon Aug 19 13:22:15 PDT 2013

With D's honestly awesome metaprogramming features, templates are liable
to be (and in fact are) used a LOT. This leads to the unfortunate
situation of template bloat: every time you instantiate a template, it
adds yet another copy of the templated code into your object file. This
gets worse when you use templated structs/classes, each of which may
define some number of methods, and each instantiation adds yet another
copy of all those methods.

This is doubly bad if these templates are used only during compile-time,
and never referenced during runtime. That's a lot of useless baggage in
the final executable. Plus, it leads to issues like this one:

	http://d.puremagic.com/issues/show_bug.cgi?id=10833

While looking at this bug, I got an idea: what if, instead of emitting
template instantiations into the same object file as non-templated code,
the compiler were to emit each instantiation into a separate static
*library*? For instance, if you have code in program.d, then the
compiler would emit non-templated code like main() into program.o, but
all template instantiations get put in, say, libprogram.a. Then during
link time, the compiler runs `ld -oprogram program.o libprogram.a`, and
then the linker will pull in symbols from libprogram.a that are
referenced by program.o.

If we were to set things up so that libprogram.a contains a separate
unit for each instantiated template function, then the linker would
actually pull in only code that is actually referenced at runtime. For
example, say our code looks like this:

	struct S(T) {
		T x;
		T method1(T t) { ... }
		T method2(T t) { ... }
		T method3(T t) { ... }
	}
	void main() {
		auto sbyte  = S!byte();
		auto sint   = S!int();
		auto sfloat = S!float();

		sbyte.method1(1);
		sint.method2(2);
		sfloat.method3(3.0);
	}

Then the compiler would put main() in program.o, and *nothing else*. In
program.o, there would be undefined references to S!byte.method1,
S!int.method2, and S!float.method3, but not the actual code. Instead,
when the compiler sees S!byte, S!int, and S!float, it puts all of the
instantiated methods inside libprogram.a as separate units:

	libprogram.a:
		struct_S_byte_method1.o:
			S!byte.method1
		struct_S_byte_method2.o:
			S!byte.method2
		struct_S_byte_method3.o:
			S!byte.method3
		struct_S_int_method1.o:
			S!int.method1
		struct_S_int_method2.o:
			S!int.method2
		struct_S_int_method3.o:
			S!int.method3
		struct_S_float_method1.o:
			S!float.method1
		struct_S_float_method2.o:
			S!float.method2
		struct_S_float_method3.o:
			S!float.method3

Since the compiler doesn't know at instantiation time which of these
methods will actually be used, it simply emits all of them and puts them
into the static library.

Then at link-time, the compiler tells the linker to include libprogram.a
when linking program.o. So the linker goes through each undefined
reference, and resolves them by linking in the module in libprogram.a
that defines said reference. So it would link in the code for
S!byte.method1, S!int.method2, and S!float.method3. The other 6
instantiations are not linked into the final executable, because they
are never actually referenced by the runtime code.

So this way, we minimize template bloat to only the code that's actually
used at runtime. If a particular template function instantiation is only
used during CTFE, for example, it would be present in libprogram.a but
won't get linked, because none of the runtime code references it. This
would fix bug 10833.

Is this workable? Is it implementable in DMD?

T

-- 
Nearly all men can stand adversity, but if you want to test a man's character, give him power. -- Abraham Lincoln