[typing] Type-erasure re generics

Wed Sep 29 09:18:09 PDT 2010

On Wednesday 29 September 2010 08:14:38 Justin Johansson wrote:
> Sorry if I should ask to this on D.help.
> 
> Can someone please inform me if D does type-erasure on generic
> types as does Java?
> 
> Here's one Java reference on the subject that I found.
> http://download.oracle.com/javase/tutorial/java/generics/erasure.html
> 
> Also I would like to be informed as to the positives and negatives
> aspects of type-erasure as might be appropriate in a D versus Java
> context.
> 
> Thanks for answers,
> 
> -- Justin Johansson

Wow. Goodness no. Generics and templates are _completely_ different.

Java is the only language that I'm aware of that does type erasure, and the only 
reason that it does it is that they needed generics to be backwards compatible, 
so the generated bytecode is the same with generics as it is without. So, 
generics are pretty much just compile-time checks in Java. I believe that C# 
does generics in a way that results in multiple types using the same code but 
that the type information is maintained with the type, so the VM knows about the 
generics. D's templates are more like C++'s templates.

In both D and C++, templates are literally for generating code. If you declare

struct S(T)
{
	T val;
}

and use S!int and S!double, then two struct types are created (typically you 
would say that S was instantiated 2 twice):

struct S!int
{
	int val;
}

struct S!double
{
	double val;
}

They are as seperate as if you had declared

struct SInt
{
	int val;
}

struct SDouble
{
	double val;
}

If you used only S!int, then only the code for S!int would be created. If you 
used S!int, S!double, and S!float, then three sets of code would be created, each 
with its own type. If you didn't use any S, then no code would be created. This 
is so literal that if you declare

struct S(T)
{
	unittest
	{
		...
	}

	T val;
}

then if you don't declare any S's, then the unittest code (with whatever it 
does) doesn't exist and is never run, while if you declared S!int, S!double, 
S!float, and S!(S!long), you'd end up with 4 versions of it and run for times (4, 
since S!(S!long)) would generate both the S!long and S!(S!Long)) types).

Neither D or C++ templates are saying to create a class or struct or which takes 
any type. They're saying to generate that code for each type that you use with 
it. It's literally as if you had copy and pasted it for every time you needed it 
for a new type, and replaced T with the new type for each. Take D's eponymous 
templates for instance (this from std.metastrings):

/**
 * Convert constant argument to a string.
 */

template toStringNow(ulong v)
{
    static if (v < 10)
        enum toStringNow = "" ~ cast(char)(v + '0');
    else
        enum toStringNow = toStringNow!(v / 10) ~ toStringNow!(v % 10);
}

unittest
{
    static assert(toStringNow!(1uL << 62) == "4611686018427387904");
}

Using  to(toStringNow!(1uL << 62)) recursively generates code, each time 
replacing the template with the variable with its name until all you have left 
is the generated value.

toStringNow!(1234); becomes

template toStringNow!1234
{
    enum toStringNow!1234 = toStringNow!(123) ~ toStringNow!4;
}

template toStringNow!123
{
    enum toStringNow!123 = toStringNow!(12) ~ toStringNow!3;
}

template toStringNow!4
{
    enum toStringNow!4 = "" ~ '4';
}

tempalet toStringNow!12
{
    enum toStringNow!12 = toStringNow!1 ~ toStringNow!2;
}

template toStringNow!2
{
    enum toStringNow!2 = "" ~ "2";
}

template toStringNow!1
{
    enum toStringNow!1 = "" ~"1"
}

which is reduced to

template toStringNow!1234
{
    enum toStringNow!1234 = "1234";
}

which is reduced to

"1234"

The code is literally generated it - in this case, recursively so. You can't 
even dream of doing this sort of thing in Java. And because the code is 
generated, the type information is always there as much as it would have been 
had you produced all of that code by hand. The type of S!int won't be erased to 
S anymore than SInt would have been. The different template instantiations are 
completely new sets of code. In Java or C#,

class S<T>
{
	T val;
}

is really

class S<Object>
{
	Object val;
}

and that's why it has to do all kinds of casts and box primitive types and 
whatnot. Since, C++ and D generate completely new sets of code for each template 
instantiation, they don't have that problem. No casts are necessary, and 
primitive types are used directly.

Now, this is often acused of causing code bloat (though a particularly advanced 
compiler can actually use the same code underneath for types of the same size - 
so S!int and S!float would be joined into one while S!long wouldn't be; dmd 
doesn't do that at this point; it would be an optimization to do so), and C# or 
Java folks might point that out as a reason that C# and Java's approaches are 
better. However, C# and Javas's approaches make generics pretty much only useful 
for container classes. They can't even dream of stuff like eponymous templates. 
So, while they make gains in the size of the code, they lose out on a _lot_ of 
power.

C++ and D templates are literally code generation mechanisms, and if you want to 
make full use of them, you need to realize that every time that you instantiate 
a new template, you are literally generating new code. That has all kinds of 
powerful implications, and Phobos definitely takes advantage of them. It's not 
without cost (it _is_ a tradeoff between space and power), but the benefit is so 
large that most of us would agree that the benefits dwarf the cost in space. And 
maybe someday dmd will get the aforementioned optimization would be make 
template instantiations with the same size for their type actually use one set 
of underlying code. But there's no way that I'd trade D templates for C#'s 
generics which use one set of code but maintain type information, let alone 
Java's which use one set of code and erase all type information.

- Jonathan M Davis