Battle-plan for CTFE

Sun May 15 04:40:40 PDT 2016

On 05/13/2016 06:32 PM, Stefan Koch wrote:
> I would like to work on a solution that does scale. The Problem is 
> not making a byteCode-interpreter. That part is relatively easy. 
> Currently I am trying to get a detailed understanding of dmd and 
> it's data-structures. (mainly it's AST.)
> 
> Generating the byte-code seems to be non-trivial.
> 
> I wonder in how far the glue layer can be of help...

Seems like I've to repeat this once more, b/c everyone including me
didn't got it in the first place. We don't need a bytecode
interpreter, it mostly adds overhead and a complicated second layer
between walking the AST and interpreting it (think of transforming a
for loop with goto into linear bytecode, almost as complicated as in
the glue layer).

What we basically need is a stack of values, a stack of frames (for
function calls and variables in scopes), and an AST visitor that does
the interpretation.
It would be most helpful for the success of this to follow common CS
examples like [¹], [²], or [³].
With realistic expectations we might have a working implementation in
a month or so. With more requirements like bytecode, using dmd's
backend, or JIT we end up with a long newsgroup discussion ;).

Tricky things for a CTFE interpreter include:

- enumerating VarDeclarations (they already have a ctfeAdrOnStack
field) in each scope, and referring to outer variables from nested scopes

At best just use a continuous stack and just set the stack pointer to
the last frame pointer when leaving a scope.

- getting function calls right

Push arguments, on return shift top of stack under arguments and pop
them (caller cleanup). If possible detect and support tail recursion.

- converting AST values to and from Interpreter Values.

Literals and constant VarExp from the AST need to be converted to an
interpreter Value before being pushed on the stack. The result of
interpretation (top of stack) needs to be converted back to an AST
literal.
Using separate data types (instead of creating AST values in the
interpreter) will be a major performance improvement over using AST
values (e.g. 16 vs. ~100 bytes). It also creates a clear boundary
between Interpreter and AST values. Currently quite some complexity is
thrown at cleaning interpreter generated AST values, and
distinguishing interpreter owned from normal AST values (which allows
some optimizations) [⁴].

We don't need a tagged union b/c the AST already contains the type
information, but a tag could be helpful for debugging [⁵].

Values can be deleted when popped from stack (no ref counting needed I
think).

- Implementing more complex data structures (arrays, strings, hash
tables, aggregates)

Use Value[], Value[Value], and a dedicated String
(char[]/wchar[]/dchar[]). For structs/classes field indexes are known
=> use fix-sized Value[]. Value.class_ needs to hold a reference to
the actual class instance for vtable calls.

Last time I was working on this (also on a bytecode interpreter) the
entry point was fairly clear [⁶] (thanks to Don).

-Martin

[¹]: [The Interpreter In An Undergraduate Compilers Course
](http://arxiv.org/pdf/1412.0426.pdf)
[²]: [L8: Interpreters &
Visitors](http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-005-elements-of-software-construction-fall-2011/lecture-notes/MIT6_005F11_lec08.pdf)
[³]: [PA 2:
Interpreter](https://sites.google.com/a/bodik.org/cs164/projects/pa2)
[⁴]: https://github.com/dlang/dmd/search?q=ownedByCtfe
[⁵]:
https://github.com/MartinNowak/dmd/blob/28ffb0ab4fa6950f60c085f33f8a2ce23df7c0cd/src/interpret.c#L73
[⁶]:
https://github.com/MartinNowak/dmd/blob/28ffb0ab4fa6950f60c085f33f8a2ce23df7c0cd/src/interpret.c#L693