tooling quality and some random rant

Sun Feb 13 10:53:41 PST 2011

Paulo Pinto wrote:
> Why C and not directly D?
> 
> It is really bad adversting for D to know that when its creator came around 
> to rewrite the linker, Walter decided to use C instead of D.

That's a very good question.

The answer is in the technical details of transitioning optlink from an all 
assembler project to a higher level language. I do it function by function, 
meaning there will be hundreds of "hybrid" versions that are partly in the high 
level language, partly in asm. Currently, it's around 5% in C.

1. Optlink has its own "runtime" system and startup code. With C, and a little 
knowledge about how things work under the hood, it's easier to create "headless" 
functions that require zero runtime and startup support. With D, the D compiler 
will create ModuleInfo and TypeInfo objects, which more or less rely on some 
sort of D runtime existing.

2. The group/segment names emitted by the C compiler match what Optlink uses. It 
matches what dmd does, too, except that dmd emits more such names, requiring 
more of an understanding of Optlink to get them in the right places.

3. The hybrid intermediate versions require that the asm portions of Optlink be 
able to call the high level language functions. In order to avoid an error-prone 
editting of scores of files, it is very convenient to have the function names 
used by the asm code exactly match the names emitted by the compiler. I 
accomplished this by "tweaking" the dmc C compiler. I didn't really want to mess 
with the D compiler to do the same.

4. Translating asm to a high level language starts with a rote translation, i.e. 
using goto's, raw pointers, etc., which match 1:1 with the assembler logic. No 
attempt is made to infer higher level logic. This makes mistakes in the 
translation easier to find. But it's not the way anyone in their right mind 
would develop C code. The higher level abstractions in C are not useful here, 
and neither are the higher level abstractions in D.

Once the entire Optlink code base has been converted, then it becomes a simple 
process to:

1. Dump the Optlink runtime, and switch to the C runtime.

2. Translate the C code to D.

And then:

3. Refactor the D code into higher level abstractions.

I've converted a massive code base from asm to C++ before (DASH for Data I/O) 
and I discovered that attempting to refactor the code while translating it is 
fraught with disaster. Doing the hybrid approach is much faster and more likely 
to be successful.

TL,DR: The C version is there only as a transitional step, as it's somewhat 
easier to create a hybrid asm/C code base than a hybrid asm/D one. The goal is 
to create a D version.