proposal: lazy compilation model for compiling binaries

Timothee Cour thelastmammoth at gmail.com
Fri Jun 21 21:45:18 PDT 2013


A)
Currently, D suffers from a high degree of interdependency between modules;
when one wants to use a single symbol (say std.traits.isInputRange), we
pull out all of std.traits, which in turn pulls out all of
std.array,std.string, etc. This results in slow compile times (relatively
to the case where we didn't have to pull all this), and fat binaries: see
example in point "D)" below.

This has been discussed many times before, and some people have suggested
breaking modules into submodules such as: std.range.traits, etc to mitigate
this a little, however this requires people to change 'import std.range'
to 'import std.range.traits' to benefit from it, and also in many cases
this will be ineffective.

B)
I'd like to propose something different that can potentially dramatically
reduce compile time/binary size, while not requiring users to scar their
source code as above.

*in short: *perform semantic analysis for a function/template/struct/class
on demand, if that symbol is encountered starting from main().
*
*
*in more details:*
suppose we compile a binary (dmd -ofmain foo1.d foo2.d main.d)
input files are lexed, parsed (code should be syntactically valid)
semantic analysis is performed, but doesn't go inside at
function/template/struct/class declaration
main() symbol is located in symbol table
start lazy semantic analysis from the main() function and using a breadth
first search (BFS) propagation strategy:
a symbol (function/template/struct/class) 's body/return type/template
constraints is only semantically analyzed when that symbol is encountered
along the BFS path.

this strategy could be enabled by a switch -lazy_compilation in dmd. The
only time it would differ from existing compilation model would be when
some unused code triggers compile error: eg:
----
void foo(){int x=y;}
void main(){}
----
dmd main.d //error: y is undefined
dmd -lazy_compilation main.d //OK: foo is never mentioned starting from
main(), so accept.

This would be very useful to speed up the edit/compile/debug cycle.

Example2:
----
auto foo(){return "import std.stdio;";}
mixin(foo);
void fun2(){import b;}
void main(){writeln("ok");}
----
lazy semantic analysis will analyze main, foo but not fun2, which is not
used. foo is analyzed because it is used in a module-level mixin
declaration.

C)
*caveats:*
this works when compiling *binaries*, as we know which symbols end up in
the final binary
for compiling libraries (-shared/-static), it works if we have a way to
specify which symbols are meant to be exported (eg
https://www.gnu.org/software/gnulib/manual/html_node/Exported-Symbols-of-Shared-Libraries.html).
Is there, currently?

We could specify a list of symbols to export to dmd via a command line
flag.

This could be:
dmd -exported_symbols=filename.d main.d bar.d
with filename.d containing all exported symbols, eg:
----
module exported_symbols;
public import foo.d; //imports all symbols from foo
public import bar:baz;//imports just bar.baz
void fun(){}//imports fun
----


D)
Example showing problem with current situation:
----
module main;
version(A)
import std.range;
else{
      //copy paste here body of 'isInputRange' from std.range
}
void fun(){ auto a=isInputRange!string;}
----
dmd -c main.d:
nm main.o|wc -l: 8
file size of main.o: 1.1K
cpu time (10 runs): 0.119 s

dmd -c -version=A main.d:
nm main.o|wc -l: 324 => 40X
file size of main.o: 72K => 70X
cpu time (10 runs): 2.7 s => 23X

Q: Why do we care about compilation speed, etc, since dmd is already fast?
A1: Many cases where it matters, eg for the REPL I'm working on, that
requires compiling on the fly and needs interactive speed.
A2: for large projects, where compilation can become slow
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20130621/28df139b/attachment.html>


More information about the Digitalmars-d mailing list