Compilation strategy

Jacob Carlborg doob at me.com
Mon Dec 17 23:38:13 PST 2012


On 2012-12-18 01:13, H. S. Teoh wrote:

> The problem is not so much the structure preprocessor -> compiler ->
> assembler -> linker; the problem is that these logical stages have been
> arbitrarily assigned to individual processes residing in their own
> address space, communicating via files (or pipes, whatever it may be).
>
> The fact that they are separate processes is in itself not that big of a
> problem, but the fact that they reside in their own address space is a
> big problem, because you cannot pass any information down the chain
> except through rudimentary OS interfaces like files and pipes. Even that
> wouldn't have been so bad, if it weren't for the fact that user
> interface (in the form of text input / object file format) has also been
> conflated with program interface (the compiler has to produce the input
> to the assembler, in *text*, and the assembler has to produce object
> files that do not encode any direct dependency information because
> that's the standard file format the linker expects).
>
> Now consider if we keep the same stages, but each stage is not a
> separate program but a *library*. The code then might look, in greatly
> simplified form, something like this:
>
> 	import libdmd.compiler;
> 	import libdmd.assembler;
> 	import libdmd.linker;
>
> 	void main(string[] args) {
> 		// typeof(asmCode) is some arbitrarily complex data
> 		// structure encoding assembly code, inter-module
> 		// dependencies, etc.
> 		auto asmCode = compiler.lex(args)
> 			.parse()
> 			.optimize()
> 			.codegen();
>
> 		// Note: no stupid redundant convert to string, parse,
> 		// convert back to internal representation.
> 		auto objectCode = assembler.assemble(asmCode);
>
> 		// Note: linker has direct access to dependency info,
> 		// etc., carried over from asmCode -> objectCode.
> 		auto executable = linker.link(objectCode);
> 		File output(outfile, "w");
> 		executable.generate(output);
> 	}
>
> Note that the types asmCode, objectCode, executable, are arbitrarily
> complex, and may contain lazy-evaluated data structure, references to
> on-disk temporary storage (for large projects you can't hold everything
> in RAM), etc.. Dependency information in asmCode is propagated to
> objectCode, as necessary. The linker has full access to all info the
> compiler has access to, and can perform inter-module optimization, etc.,
> by accessing information available to the *compiler* front-end, not just
> some crippled object file format.
>
> The root of the current nonsense is that perfectly-fine data structures
> are arbitrarily required to be flattened into some kind of intermediate
> form, written to some file (or sent down some pipe), often with loss of
> information, then read from the other end, interpreted, and
> reconstituted into other data structures (with incomplete info), then
> processed. In many cases, information that didn't make it through the
> channel has to be reconstructed (often imperfectly), and then used. Most
> of these steps are redundant. If the compiler data structures were
> already directly available in the first place, none of this baroque
> dance is necessary.

I couldn't agree more.

-- 
/Jacob Carlborg


More information about the Digitalmars-d mailing list