D parser in tango or phobos

Wed Sep 10 14:49:33 PDT 2008

Fawzi Mohamed wrote:
> I think that having a compiler of a language written in itself is 
> certainly nice from the intellectual point of view, but not immediately 
> useful in any sense, and frankly unimportant for most people, even if it 
> gives some real benefits to the language development.

Actually, I was mulling over the idea this afternoon, and I think there 
actually could be some major advantages to having the compiler 
implemented in D.

Depending on the architecture, of course, it might be much easier to 
load the compiler as a library. Then you could do all sorts of neat 
things like compiling and loading code on the fly.

    char[] sourcecode = getSourcecodeFromSomewhere();
    ASTCodeModule myModule = parser.parse(sourcecode);

At this point, with the compiler exposing a well-defined API for all of 
its internal representations, you could add your own hooks to operate on 
AST nodes between those phases.

    foreach (ClassDeclaration clazz; myModule.classes) {
       FunctionDeclaration[] methods = clazz.publicFunctions;
       foreach (auto method; methods) {
          decorateMethodWithTraceLogging(method);
       }
    }

And, if the linker & loader were also written in D, you could take those 
runtime-parsed and dynamically-modified pieces of code, immediately 
lining and loading them right into the application.

    SharedLib library = compiler.toLib(myModule);

    // Maybe write the library to a file
    library.emit(`C:\path\to\my-library.lib`);

    // ...Or execute the code directly
    void delegate() entry = library.entryPoint;
    entry.execute();

The .NET framework has some of this kind of functionality (in 
Reflection.Emit), allowing programmers to build executable code, 
opcode-by-opcode, at runtime.

The resultant code is subject to the same JIT compilation as any other 
.NET code.

A good example of its usage is in the Regex implementation, in the .NET 
standard library. It builds a custom function, with raw GOTO opcodes and 
everything, based on the regex string passed into the constructor at 
runtime. Consequently, the .NET regex engine is very very efficient.

The same kind of thing exists in the Tango regex engine -- you can 
generate and compile D code from a regex -- but only if the regex string 
is known at compile-time.

Furthermore, if the D compiler was written in D, and if it could spawn 
its own subordinate instances of the compiler on the fly, immediately 
loading compiled code into executable memory, think of how that would 
expand the power of CTFE. Any legal function would be callable at 
compile-time just as easily as at runtime.

The opposite would be true too. You'd be able to generate and compile 
templates at runtime, potentially creating whole new Types (which has 
only ever been possible at compile-time). Admittedly, I can't think of 
any actual utility for runtime type-generation, but I'm sure someone 
more clever than me could think of some use for it.

Anyhow, those are the sorts of things that I think would become feasible 
if the D parser, compiler, linker, and loader were all written in D.

Calling the compiler dynamically from user code, or from within the 
compiler itself, could be hugely powerful.

(NOTE: I'm not actually *advocating* any of this. Just musing. There are 
plenty of reasons *not* to write the compiler in D, such as already 
having done ten years of work (more on the backend) to refine the 
existing compiler.)

--benj