Algorithms, term rewriting and compile time reflection

Wed Oct 22 04:10:33 PDT 2014

Generic programming without high level analytical capabilities 
are problematic since you cannot optimize everything on a low 
level.

For instance, some algorithms run better if the input is sorted, 
and although you can cover one property with typing (e.g. sorted 
range) the combinatorial explosion is too big and the code 
becomes very messy.

The better approach is to let the compiler deduce properties of 
the input/output based on stronger assertions than 
postconditions. And it should work for FFI as well as for D. That 
way D can obtain meta-information about C functions.

Let's call it input-output-assumptions and take a look at:

y = f(x)

After calling f on x, f could provide propositions about y such 
as:

- y is null iff x is null
- length(y) == length(x) if x is not null
- y is sorted for all x if x is not null
- y contains the same elements as x
- y is alias free (copy by value)

It should also state whether the semantic side effects has been 
completely described.

Then you need statments for f that says something about how 
functions affects the properties of input like. E.g.:

z = g( f(x) )

For g we could say that it preserves:
- element order
- element values
- z is an alias free copy of the input

Thus we cannot assume length(z) == length(x), but we can assume 
that it is sorted and that z is a subset of x. That means the 
compiler can optimize expressions. E.g.:

y = f(x);
z = g(y);

inline_sort(z);  ==>  ;

x = union(z,y); ==> ;

canFind(y,v) || canFind(z,v) ==> canFind(y,v);

linear_find(x,v)  =>  binary_search(y,v)

With term rewriting and compile time reflection we can now 
perform optimizations by querying properties of expressions, 
looking up dependent results of an expression, and doing a search 
over various ways of ordering expressions.

f(sort(g(x))) can be optimized into f(g(x)) if the knowledge of 
the effects of sort() is complete and if the total effects of f,g 
and x completely covers what sort() do.

These kinds of optimizations are very difficult to achieve in a 
low level backend, but you really need them in order to do 
generic programming properly.

A simple start would be to not provide term rewriting as a 
language feature, but rather define a vocabulary that is useful 
for phobos and hardcode term rewriting for that vocabulary.

I think this is feasible.