Understanding Templates: why can't anybody do it?

Sat Mar 17 15:07:57 PDT 2012

On Sat, Mar 17, 2012 at 10:13:44PM +0100, Paulo Pinto wrote:
> Am 17.03.2012 21:20, schrieb Nick Sabalausky:
[...]
> >I've spent a total of about 6 years in college, always got A's and
> >B's in the CS classes, and yet I'm convinced that programming classes
> >are completely and utterly useless. Most of the instructors
> >themselves barely even know how to code (among many, many other
> >problems). So if you're learning programming at college, then I'm not
> >surprised you've gotten the impression that nobody understands
> >templates: Around universities, they probably don't. (Also, it
> >doesn't help that C++'s templates could be a lot better.)
> >
> >I even had one CS prof, he said that the only language he knew was C.
> >And yet, after seeing his feedback on my code, it became abundantly
> >clear to me that he didn't even understand how strings worked (ie:
> >C's famous null-terminated strings) *in the ONE language he knew*.
> >
> 
> I would say that most European universities seem to provide a
> completely different experience, at least speaking from my experience
> in Portuguese/Swiss/German ones.
[...]

My experience in a Canadian university wasn't quite that bad either.
True, there *were* some courses where the profs don't really care about
teaching, or where they're so self-absorbed that you can barely even
begin to understand where their sentences start and end, let alone
understand what they're trying to say. But I did have quite a few good
programming classes, notably one that got me started off with C, and
another that got me started with C++.

Of course, I also learned a lot more stuff about C/C++ that wasn't
taught in class, but then you can hardly expect the prof to teach you
*everything* there is to know about C/C++. After all, you're expected to
be an adult by then, and theoretically you'd know how to learn more on
your own.

Anyway.

Coming back to templates, I think Nick did an excellent job at
explaining them.

When learning something new in programming, I always like to ask, why is
feature X this way?  What is it used for? Why did people invent such a
thing in the first place? What was the motivation behind it? What does
it solve? What's so good about them that I have to learn this?

Templates are the result of a long history of trying to minimize the
amount of code you have to write/change when all you want is to add some
new data to existing code.

Suppose you have a program that needs to keep track of a list of
integers, say. There are, of course, many ways of doing this, but
suppose you chose to implement it as a linked list. So you write
something like this:

	class Node {
		Node next;
		int data;
	}
	class LinkedList {
		Node head;
		Node tail;

		void add(int data) {
			Node n = new Node;
			n.data = data;
			... // insert node into list
		}
		void del(int data) {
			...
		}
	}

So far so good. But what if later on, you realize that you need to store
not just an int, but also a float as well? So you'd have to go through
all that code, and add a new field wherever it's needed:

	class Node {
		Node  next;
		int   intdata;
		float floatdata;
	}
	class LinkedList {
		Node head;
		Node tail;

		void add(int idata, float fdata) {
			Node n = new Node;
			n.intdata = idata;
			n.floatdata = fdata;
			... // insert node into list
		}
		void del(int data, float fdata) {
			...
		}
	}

OK, that's a lot of code changes for something as simple as adding a
single field to your list nodes. What if you need a string next? You
have to go through the entire code and insert string fields and string
arguments everywhere. And what if you need something else after that?
Again, you have to go through all your code and update everything. (And
perhaps miss a few places, causing subtle bugs to creep into your
program.)

We can do better. We can encapsulate the data inside a separate struct
so that we don't have to keep changing Node and LinkedList every time we
need to store new data:

	struct Data {
		int intdata;
		float floatdata;
	}
	class Node {
		Node next;
		Data data;	// ahhh, much better!
	}
	class LinkedList {
		Node head;
		Node tail;

		void add(Data d) {
			Node n = new Node;
			n.data = d;
			... // insert node into list
		}
		void del(Data d) {
			...
		}
	}

Now, what if we need to add a string into our list of data? No problem,
just add a single line to struct Data:

	struct Data {
		int intdata;
		float floatdata;
		string strdata;
	}

See how you don't have to change Node and LinkedList at all? That's
good, because you may have spent hours debugging the linked list code,
and so not making any more changes ensures no new bugs will be
introduced. Plus you've saved yourself a lot of work going through
everything to add the new field.

So far so good.

But what if you need *another* list? Say, a list that stores a different
kind of data? You can't change struct Data, because you still need to
use it in the earlier part of the program. So you'd have to write the
code all over again, except with a different struct:

	struct OtherData {
		int x,y,z;
	}
	class OtherNode {
		OtherNode next;
		OtherData data;
	}
	class OtherLinkedList {
		OtherNode head;
		OtherNode tail;

		void add(OtherData d) {
			OtherNode n = new OtherNode;
			n.data = d;
			... // insert node into list
		}
		void del(OtherData d) {
			...
		}
	}

Wait a minute here! This code looks identical to the previous code,
except that the contents of the struct is different, and Node is renamed
to OtherNode, Data is renamed to OtherData, etc.. The algorithms for
adding/removing nodes from the linked list is exactly the same, except
that a different kind of data is being passed around.

Do we *really* have to duplicate the entire linked list code just so we
can support two kinds of linked list?

If you think about it, there's nothing about the concept of a linked
list that is specifically tied to Data or OtherData. A linked list is
simply a linked list; it's a sequences of nodes that contain data. What
that data is, is irrelevant. The way you add a node to a linked list
that contains Data is exactly the same way you add a node to a linked
list that contains OtherData. So Node and LinkedList really shouldn't be
tied to Data or OtherData at all.

This is where templates come in. It allows us to say, a linked list
consists of nodes linked to each other in a chain, and stores some kind
of data X, but what X is, is decided by the code elsewhere; it's not
important to the linked list itself.

Here's how you use templates to do this:

	class Node(X) {
		Node next;
		X d;
	}
	class LinkedList(X) {
		Node!X head;
		Node!X tail;

		void add(X data) {
			Node!X n = new Node!X();
			n.d = data;
			... // insert node into list
		}
		void del(X data) {
			...
		}
	}

Looks familiar? It looks just like our code from before, except that now
the symbol X is not bound to any particular thing. For example, to
create a linked list for storing struct Data, we can do this:

	LinkedList!Data  datalist;

When you write this, the compiler substitutes Data for X, and then
automatically replaces all instances of X in the definition of Node and
LinkedList with "Data". So this gives us a list identical to the one we
wrote at first.

What about when we need to store OtherData? Now we don't have to
duplicate the entire linked list code, we just write:

	LinkedList!OtherData  otherlist;

And the compiler automatically substitutes OtherData for all occurrences
of X in Node and LinkedList. So it effectively "copy-n-paste"s the
linked list code, just like we did by hand in the second example, except
that it's now automatic.

What if we want a linked list that stores string? Easy:

	LinkedList!string  strlist;

A linked list that stores doubles? No problem:

	LinkedList!double  dlist;

What about a linked list that stores linked lists? This would give you a
nasty headache if you had to implement it by hand. You'd have to go
through the linked list code, figure out which code applies to the
sublists, and which code applies to the main list, and in all likelihood
you'll end up with many pages of code full of bugs everywhere, because
it's just too complicated.

But this is no problem at all with templates.  Here's a linked list that
stores linked lists of ints:

	LinkedList!(LinkedList!int) nestedlist;

And here's a linked list that stores linked lists of Data:

	LinkedList!(LinkedList!Data) nestedlist;

The compiler automatically does the parameter substitution and embedding
the lists inside each other. We only need to write a single line of
code. Instead of 200 lines of code (and 199 bugs).

This is the basic idea behind templates.

Of course, templates can do a lot more than just make it easy to
implement data structures. Now that we know the compiler can substitute
template parameters, we can teach it to do a lot more tricks.  But this
should give you the basic motivations behind why we should have such
things as templates, and why they are useful.

T

-- 
It is of the new things that men tire --- of fashions and proposals and
improvements and change. It is the old things that startle and
intoxicate. It is the old things that are young. -- G.K. Chesterton