Possible solution to template bloat problem?

H. S. Teoh hsteoh at quickfur.ath.cx
Tue Aug 20 10:58:05 PDT 2013


On Tue, Aug 20, 2013 at 07:18:50PM +0200, Ramon wrote:
> On Tuesday, 20 August 2013 at 17:05:21 UTC, Dicebot wrote:
> >On Tuesday, 20 August 2013 at 16:59:00 UTC, Ramon wrote:
> >>I'm afraid the issue is bigger.
> >
> >Your insight about variety of modern programming language
> >applications is extremely limited. If you are willing to introduce
> >random runtime costs for nothing, there are lot of other awesome
> >languages that can satisfy your needs.
> >
> >As someone who does not want to write embedded'ish code in C
> >anymore (and hopes to drag D there eventually) I am dangerously
> >close to hating you.
> 
> And that shows. Well, if that fits your definition of
> "professional", so be it.
> 
> To avoid misunderstandings: I still like D and think that it's a
> great language/solution. I still see very attractive points. I still
> think that even templates can be attractive and are way better done
> in D than in C++.
> 
> I just don't agree that writing "generics" in the brochure and
> actually delivering templates is a good thing. And I happen to think
> that real generics are a very good thing.
[...]

I think the problem is that your definition of "generic" is not the same
as ours. :)

Templates actually include your definition of generics if you use it
correctly. Here's an example:

	interface LessThanComparable {
		bool opCmp(LessThanComparable b);
	}

	interface LessThanComparableRange {
		@property bool empty();
		@property LessThanComparable front();
		void popFront();
	}

	void sortRange(LessThanComparableRange range) {
		// Note: single template instantiation of sort here
		std.algorithm.sort(range);
	}

	class MyClass : LessThanComparable {
		override bool opCmp(LessThanComparable b) { ... }
		...
	}

	class MyClassRange : LessThanComparableRange {
		override @property bool empty() { ... }
		override @property LessThanComparable front() { ... }
		override void popFront() { ... }
	}

	class MyOtherClass : LessThanComparable {
		override bool opCmp(LessThanComparable b) { ... }
		...
	}

	class MyOtherClassRange : LessThanComparableRange {
		override @property bool empty() { ... }
		override @property LessThanComparable front() { ... }
		override void popFront() { ... }
	}

	void main() {
		MyClassRange firstRange = ...;
		MyOtherClassRange secondRange = ...;
		...
		sortRange(firstRange);
		sortRange(secondRange);
	}

A few notes:

- LessThanComparable lets you do polymorphism at runtime, so sortRange
  is a non-template "real generic" function that can sort any range
  involving LessThanComparable.

- LessThanComparableRange provides a single ABI for any range of
  LessThanComparable elements.

- sortRange instantiates the template function std.algorithm.sort
  exactly *once*, and it will work with any runtime type that implements
  LessThanComparableRange.

- The rest of the code shows how you can define a bunch of classes that
  implement these interfaces, and they can be used with the sortRange
  function with full runtime polymorphism.

You will note that there's some amount of boilerplate here -- because I
just typed this up off the top of my head for illustration purposes; in
real code you'd use mixins or other such stuff, perhaps *cough* use a
template for generating all the xxxRange classes *cough* automatically.

So this lets you have "real generics" which, as you can see, is really
just a subset of what is covered by the template std.algorithm.sort,
which can handle *any* concrete type, even those that don't implement
any runtime polymorphic interfaces.

Now let's talk about about how templates and "real generics" can work
together.

If you think about the above code carefully, you will realize that, at
the machine level, you cannot avoid the overhead of dereferencing
pointers to the various interfaces and class vtables, because the CPU
can only deal with concrete types, it doesn't know how to work with data
that can be any type. So, at *some* level, whether visible at the code
level or not, everything must be translated down to type-specific code
that the CPU can actually run. In "real generics", this is accomplished
by having a single ABI (the interfaces LessThanComparable and
LessThanComparableRange) that all concrete types must implement. Once
implemented, the sortRange function doesn't have to deal with concrete
types anymore: it can express the sorting algorithm directly in terms of
the interfaces, and when a concrete operation like '<' is desired, it
invokes LessThanComparable's opCmp method to accomplish it. (Which, in
turn, essentially dereferences a function pointer that points to the
actual machine code that does the comparison of the concrete type.)

This, of course, has a lot of runtime costs: you need space to store the
vtables, you need to initialize the pointers to these vtables every time
you create a new instance of a class, it requires runtime overhead for
calling a function via a function pointer instead of directly, etc.. The
advantage, though, is that it allows sortRange to be "real generic", in
the sense that you can load up a dynamic library at runtime that returns
a range of elements of unknown type, and sortRange will be able to sort
it just by using the LessThanComparable* interfaces.

Now let's think about the template version of sortRange, which is just
std.algorithm.sort. Here, the binding to the concrete type happens at
compile-time; rather than produce a single machine code for sortRange
that handles polymorphism via indirection (interfaces, function
pointers, etc.), the template produces code that sorts a range of a
*specific* type. Since we know exactly what concrete types we're dealing
with, we don't need any of the indirections of the "real generic"
approach; the compiler's optimizer can take advantage of characteristics
of the concrete type to produce the most optimized machine code for
sorting ranges of that kind. Each instantiation of std.algorithm.sort
produces optimal machine code for sorting that particular range, because
all the bindings to the concrete type is done at compile-time rather
than runtime. This gives you the top performance.

Of course, in programming, nothing comes for free; so the price for this
top performance is that you need to produce many copies of the sort
function -- one for each type of range, each optimized for that
particular type of range. And on the surface, it would appear that it
would be unable to handle runtime polymorphism either, because the
template must be bound to the concrete types at compile-time. Right?

Actually, this is where the power of templates is shown: in the example
code I gave above, I deliberately implemented sortRange with
std.algorithm.sort. But actually, the way I wrote it is kind of
unnecessary, because std.algorithm.sort can be instantiated *directly*
with LessThanComparableRange! So actually, we don't even need sortRange
at all -- we can just call std.algorithm.sort directly on our "real
generic" containers, and the compiler will produce a "real generic"
version of sort that has all the runtime indirections that I described
above, for handling runtime polymorphic objects, dynamically-loaded
objects, etc..

Armed with this insight, we can now see that we can actually have the
best of both worlds: if we already know the concrete types at
compile-time, the template system optimizes the sort function at
compile-time to deal specifically with that concrete type -- you get
optimal performance. But if we don't know the concrete type at
compile-time, we create an interface to be implemented by future
concrete types (say by a 3rd party vendor), and the template system
produces a "real generic" version of sort that can handle runtime
polymorphism.

IOW, the templating system *includes* what you describe as "real
generics"; it is not inferior to it at all! In fact, the templating
system can do what "real generics" can't: produce optimal code for a
specific type without needing to copy-n-paste code (either manually or
with an IDE or whatever). It's the *compiler* that instantiates the
template with the concrete type, so it has access to all the specific
implementation details of said concrete type which it can use to produce
the best machine code for it -- automatically.


T

-- 
Windows: the ultimate triumph of marketing over technology. -- Adrian von Bidder


More information about the Digitalmars-d mailing list