The "type" type dream

Tue Nov 25 14:31:02 PST 2008

bearophile wrote:
> From what I've seen so far if you want a very flexible language there are two solutions:
> 1) you can attach types to values, and manage types at runtime, ending with dynamic languages like Ruby, Python, Scheme, CLips, tcl, and so on.
> 2) Otherwise you probably need a very flexible static type system, with strong type-inferencing capabilities, and capable to manage types is a refined way. A Hindley–Milner type inference algorithm may suffice:
> http://en.wikipedia.org/wiki/Type_inference#Hindley.E2.80.93Milner_type_inference_algorithm
> http://web.archive.org/web/20050911123640/http://www.cs.berkeley.edu/~nikitab/courses/cs263/hm.html

Static type inference + OO = disaster.

Say a module consists of this:
class A { void add(int    o); }
class B { void add(string o); }
f(x, y) { x.add(y); }

What are the types of x and y in f? The only correct answer is that 
there are two versions of f:
void f(A x, int    y) { x.add(y); }
void f(B x, string y) { x.add(y); }

This may not have been intended by the programmer (if there are 30 
classes with an "add" method, many which are totally unrelated, it's 
very likely the programmer didn't intend to create overloads for every 
single one). If the programmer did intend a sort of "compile-time duck 
typing", this will propogate to every caller of f.

The big problem, however is efficiency. Every method name is bound to 
many possible classes, each of which must be considered. These 
possibilities explode up the type graph, a Cartesian product of all 
possible arguments being created at each function declaration. This 
means that literally thousands of versions of a function might need to 
be created for any non-trivial function (some of this can be turned into 
branches or broken down into multiple functions). It's simply not a 
scalable solution. For an example, take a look at ShedSkin (which does a 
combination of this method and flow analysis), which works great for up 
to 200 LOC, but in its current form will likely never work for 
enterprise-size applications.

The last problem: every class must be known at the time of compiling one 
module, which is a killer for complete TI in D, anyways.

TI + OO works very well in the forward case (flow analysis, i.e. this 
type is this here, so these are the possible types it can be here...). 
This has been put to good use in optimizing Java/.NET. However, if a 
function is going to be exported for use outside a module, flow analysis 
won't work, since another module has to be able to call it with 
unexpected types. It also can prove to be very slow unless localized to 
a small region (see how long it takes to run Coverity on a sufficiently 
large project... and that's just tracking nulls through code-path 
analysis, imagine it was tracking type sets).

Static type inference is best left to a language like Haskell - the type 
of a function (it's return, arguments, etc.) can be determined by 
looking only at the function itself because every possible function it 
can call is within a known namespace.

(if you're still reading this sorry -- I just started doing some 
research on a similar topic, so I'm pretty excited about it).