optlink on multicore machines

Tue Jun 30 18:11:42 PDT 2009

Derek Parnell wrote:
> On Tue, 30 Jun 2009 20:54:55 +0200, dennis luehring wrote:
> 
>> Walter Bright schrieb:
>>> BCS wrote:
>>>> I IS running fine on 3 or 4 multicore machines around here.
>>> That's a mystery, then.
>> thats the wonderfull world of hard to catch and reproduce multithreading 
>> problems - hope D will help here in the future
> 
> Ok then ... so optlink is going to be rewritten in D - excellent! And good
> luck to the brave developer too.
> 

Just out of curiosity... Why is a linker so hard to write?

A few years ago, I developed a small domain specific language and 
implemented its compiler, outputting bytecode for a very specialized 
(and limited purpose) virtual machine.

In my case, I decided it was easier to give good error messages if the 
compiler & linker were a single entity. I've always been annoyed by the 
discrepancy between compilers and linkers (mostly because build tools 
have their own special languages, pointlessly different than the 
development language). So my compiler combined compilation and linking 
into a single step.

Every time the compiler encountered an "import" statement, it checked to 
see whether a symbol table existed for the imported module and, if not, 
it added the module to the parse queue. After processing a new module, 
it would add the resultant code into a namespace-aware symbol table for 
the given module.

Once the parse queue was empty, I checked for unresolved symbols, cyclic 
dependency errors, etc. If there were no other referential errors (and 
if all the other semantic checks passed), then I'd start the 
code-generation process at the main entry point. The whole program was 
represented as a DAG, and writing bytecode was as simple as traversing 
that graph. Since the "linking" behavior was built right into the 
compiler, it was a piece of cake.

Anyhow...

Whenever someone on the NG complains about optlink, the inevitable 
conclusion is that it would be a huge undertaking to produce a new or 
improved linker.

Why?

Seems to me that a new linker implementation would be relatively 
straightforward. There are really only three steps:

1) Parse object files.
2) Create DAG structures using references in those object files.
3) Walk the graph, copying the code (with rewritten addresses) into the 
final executable.

Is it really more complex than that? What am I missing?

(Caveat: I don't know much about Windows PE, or any of the many other 
object file formats. Still, though... it doesn't seem like it could be 
THAT difficult. The compiler has already done most of the tricky stuff.)

--benji