Possible solution to template bloat problem?

Mon Aug 19 15:11:38 PDT 2013

On Monday, 19 August 2013 at 20:23:46 UTC, H. S. Teoh wrote:
> With D's honestly awesome metaprogramming features, templates 
> are liable
> to be (and in fact are) used a LOT. This leads to the 
> unfortunate
> situation of template bloat: every time you instantiate a 
> template, it
> adds yet another copy of the templated code into your object 
> file. This
> gets worse when you use templated structs/classes, each of 
> which may
> define some number of methods, and each instantiation adds yet 
> another
> copy of all those methods.
>
> This is doubly bad if these templates are used only during 
> compile-time,
> and never referenced during runtime. That's a lot of useless 
> baggage in
> the final executable. Plus, it leads to issues like this one:
>
> 	http://d.puremagic.com/issues/show_bug.cgi?id=10833
>
> While looking at this bug, I got an idea: what if, instead of 
> emitting
> template instantiations into the same object file as 
> non-templated code,
> the compiler were to emit each instantiation into a separate 
> static
> *library*? For instance, if you have code in program.d, then the
> compiler would emit non-templated code like main() into 
> program.o, but
> all template instantiations get put in, say, libprogram.a. Then 
> during
> link time, the compiler runs `ld -oprogram program.o 
> libprogram.a`, and
> then the linker will pull in symbols from libprogram.a that are
> referenced by program.o.
>
> If we were to set things up so that libprogram.a contains a 
> separate
> unit for each instantiated template function, then the linker 
> would
> actually pull in only code that is actually referenced at 
> runtime. For
> example, say our code looks like this:
>
> 	struct S(T) {
> 		T x;
> 		T method1(T t) { ... }
> 		T method2(T t) { ... }
> 		T method3(T t) { ... }
> 	}
> 	void main() {
> 		auto sbyte  = S!byte();
> 		auto sint   = S!int();
> 		auto sfloat = S!float();
>
> 		sbyte.method1(1);
> 		sint.method2(2);
> 		sfloat.method3(3.0);
> 	}
>
> Then the compiler would put main() in program.o, and *nothing 
> else*. In
> program.o, there would be undefined references to 
> S!byte.method1,
> S!int.method2, and S!float.method3, but not the actual code. 
> Instead,
> when the compiler sees S!byte, S!int, and S!float, it puts all 
> of the
> instantiated methods inside libprogram.a as separate units:
>
> 	libprogram.a:
> 		struct_S_byte_method1.o:
> 			S!byte.method1
> 		struct_S_byte_method2.o:
> 			S!byte.method2
> 		struct_S_byte_method3.o:
> 			S!byte.method3
> 		struct_S_int_method1.o:
> 			S!int.method1
> 		struct_S_int_method2.o:
> 			S!int.method2
> 		struct_S_int_method3.o:
> 			S!int.method3
> 		struct_S_float_method1.o:
> 			S!float.method1
> 		struct_S_float_method2.o:
> 			S!float.method2
> 		struct_S_float_method3.o:
> 			S!float.method3
>
> Since the compiler doesn't know at instantiation time which of 
> these
> methods will actually be used, it simply emits all of them and 
> puts them
> into the static library.
>
> Then at link-time, the compiler tells the linker to include 
> libprogram.a
> when linking program.o. So the linker goes through each 
> undefined
> reference, and resolves them by linking in the module in 
> libprogram.a
> that defines said reference. So it would link in the code for
> S!byte.method1, S!int.method2, and S!float.method3. The other 6
> instantiations are not linked into the final executable, 
> because they
> are never actually referenced by the runtime code.
>
> So this way, we minimize template bloat to only the code that's 
> actually
> used at runtime. If a particular template function 
> instantiation is only
> used during CTFE, for example, it would be present in 
> libprogram.a but
> won't get linked, because none of the runtime code references 
> it. This
> would fix bug 10833.
>
> Is this workable? Is it implementable in DMD?
>
>
> T

Without link-time optimisation, this prevents inlining doesn't it?