Explicit TCE

Fri Oct 12 11:17:30 PDT 2012

> No idea what you are talking about.

I'm not sure which part wasn't clear, so I'll try to explain 
myself. Please don't feel offended if I clarify things you 
already understand.

An optimizable tail call must simply be a function call. The 
current stack frame would be replaced with the new function, so 
anything more complex than a simple function call would require 
some stack from the preceding function to stick around in the new 
function, thus requiring the old stack to stick around.

For example, te following is not optimizable the old stack (the 
one with 3) needs to be maintained until foo() returns, which is 
not TCE.

return foo() * 3

Since the old stack won't be around anymore, that leaves us with 
in a sticky situation with regard to scope():

http://dlang.org/statement.html#ScopeGuardStatement

If the current stack is going to be replaced with data from 
another function call, the behavior of scope() is undefined. The 
scope that scope() was in has now been repurpose, but the scope 
is still kind of there. If scope() is allowed, they must be 
executed just before the tail call, otherwise it will be 
overwritten (or it has to stick around until the actual stack 
frame is cleared. Consider:

void a() {
   become b();
}

void b() {
   // when does this get called?
   scope(exit) writeln("exited");
   become a();
}

If we allow scope(), then the line should be written before the 
call to a(). If we don't, then this is a compile time error. I 
like disallowing it personally, because if the scope(exit) call 
frees some memory that is passed to a, the programmer may think 
that it will be called after a exits, which may not be the case.

void a(void* arr) {
   // do something with arr
   become b();
}

void b() {
   void* arr = malloc(sizeof(float) * 16);
   scope(exit) free(arr);
   become a(arr);
}

I just see this as being a problem for those who don't fully 
understand scoping and TCE.

My mention of overhead was just how complicated it would be to 
implement. The general algorithm is (for each become keyword):

* determine max stack size (consider all branches in all 
recursive contexts)
* allocate stack size for top-level function
* do normal TCE stuff (use existing stack for new call)

The stack size should be known at compile time for cases like the 
one above (a calls b, b calls a, infinitely) to avoid infinitely 
expanding stack. A situation like this is a memory optimization, 
so forcing guaranteed stack size puts an upper-bound on memory 
usage, which is the whole point of TCE. If the stack is allowed 
to grow, there is opportunity for stack overflow.

My use case for this is a simple compiler, but I'm sure this 
could be applied to other use cases as well.  I'd like to produce 
code for some BNF-style grammar where each LHS is a function. 
Thus, my state machine wouldn't be a huge, unnatural switch 
statement that reads in the current state, but a series of code 
branches that 'become' other states, like an actual state machine.

For example:

A := B | C | hello
B := bye | see ya
C := go away

void A() {
     char next = getNext();
     if (next == 'b' || next == 's') {
         become B();
     }
     if (next == 'g') {
         become C();
     }
     if (next == 'h') {
         // consume until hello is found, or throw exception
         // then put some token on the stack
     }
}

void B() {
     // consume until 'bye' or 'see ya'
}

void C() {
     // consume until 'go away'
}

This would minimize memory use and allow me to write code that 
more closely matches the grammar. There are plenty of other use 
cases, but DSLs would be very easy to implement with TCE.