Program size, linking matter, and static this()

Martin Nowak dawg at dawgfoto.de
Fri Dec 16 22:09:50 PST 2011


On Fri, 16 Dec 2011 19:29:18 +0100, Andrei Alexandrescu  
<SeeWebsiteForEmail at erdani.org> wrote:

> Hello,
>
>
> Late last night Walter and I figured a few interesting tidbits of  
> information. Allow me to give some context, discuss them, and sketch a  
> few approaches for improving things.
>
> A while ago Walter wanted to enable function-level linking, i.e. only  
> get the needed functions from a given (and presumably large) module. So  
> he arranged things that a library contains many small object "files"  
> (that actually are generated from a single .d file and never exist on  
> disk, only inside the library file, which can be considered an archive  
> like tar). Then the linker would only pick the used object "files" from  
> the library and link those in. Unfortunately that didn't have nearly the  
> expected impact - essentially the size of most binaries stayed the same.  
> The mystery was unsolved, and Walter needed to move on to other things.
>
> One particularly annoying issue is that even programs that don't  
> ostensibly use anything from an imported module may balloon inexplicably  
> in size. Consider:
>
> import std.path;
> void main(){}
>
> This program, after stripping and all, has some 750KB in size. Removing  
> the import line reduces the size to 218KB. That includes the runtime  
> support, garbage collector, and such, and I'll consider it a baseline.  
> (A similar but separate discussion could be focused on reducing the  
> baseline size, but herein I'll consider it constant.)
>
> What we'd simply want is to be able to import stuff without blatantly  
> paying for what we don't use. If a program imports std.path and uses no  
> function from it, it should be as large as a program without the import.  
> Furthermore, the increase should be incremental - using 2-3 functions  
> from std.path should only increase the executable size by a little, not  
> suddenly link in all code in that module.
>
> But in experiments it seemed like program size would increase in sudden  
> amounts when certain modules were included. After much investigation we  
> figured that the following fateful causal sequence happened:
>
> 1. Some modules define static constructors with "static this()" or  
> "static shared this()", and/or static destructors.
>
> 2. These constructors/destructors are linked in automatically whenever a  
> module is included.
>
> 3. Importing a module with a static constructor (or destructor) will  
> generate its ModuleInfo structure, which contains static information  
> about all module members. In particular, it keeps virtual table pointers  
> for all classes defined inside the module.
>
> 4. That means generating ModuleInfo refers all virtual functions defined  
> in that module, whether they're used or not.
>
> 5. The phenomenon is transitive, e.g. even if std.path has no static  
> constructors but imports std.datetime which does, a ModuleInfo is  
> generated for std.path too, in addition to the one for std.datetime. So  
> now classes inside std.path (if any) will be all linked in.
>
> 6. It follows that a module that defines classes which in turn use other  
> functions in other modules, and has static constructors (or includes  
> other modules that do) will baloon the size of the executable suddenly.
>
> There are a few approaches that we can use to improve the state of  
> affairs.
>
> A. On the library side, use static constructors and destructors  
> sparingly inside druntime and std. We can use lazy initialization  
> instead of compulsively initializing library internals. I think this is  
> often a worthy thing to do in any case (dynamic libraries etc) because  
> it only does work if and when work needs to be done at the small cost of  
> a check upon each use.
>
> B. On the compiler side, we could use a similar lazy initialization  
> trick to only refer class methods in the module if they're actually  
> needed. I'm being vague here because I'm not sure what and how that can  
> be done.
>
> Here's a list of all files in std using static cdtors:
>
> std/__fileinit.d
> std/concurrency.d
> std/cpuid.d
> std/cstream.d
> std/datebase.d
> std/datetime.d
> std/encoding.d
> std/internal/math/biguintcore.d
> std/internal/math/biguintx86.d
> std/internal/processinit.d
> std/internal/windows/advapi32.d
> std/mmfile.d
> std/parallelism.d
> std/perf.d
> std/socket.d
> std/stdiobase.d
> std/uri.d
>
> The majority of them don't do a lot of work and are not much used inside  
> phobos, so they don't blow up the executable. The main one that could  
> receive some attention is std.datetime. It has a few static ctors and a  
> lot of classes. Essentially just importing std.datetime or any std  
> module that transitively imports std.datetime (and there are many of  
> them) ends up linking in most of Phobos and blows the size up from the  
> 218KB baseline to 700KB.
>
> Jonathan, could I impose on you to replace all static cdtors in  
> std.datetime with lazy initialization? I looked through it and it  
> strikes me as a reasonably simple job, but I think you'd know better  
> what to do than me.
>
> A similar effort could be conducted to reduce or eliminate static cdtors  
> from druntime. I made the experiment of commenting them all, and that  
> reduced the size of the baseline from 218KB to 200KB. This is a good  
> amount, but not as dramatic as what we can get by working on  
> std.datetime.
>
>
> Thanks,
>
> Andrei

We'd need the linker to do anything of this. Unreferenced symbols should  
be outputted using
kind of vague linkage (multiobj partly does this). I-reference-everything  
stuff link ModuleInfos
should only create weak references. This includes that localClasses might  
contain only
part of the actual module. People can use the designated export attribute  
to forcefully
output unused symbols.


More information about the Digitalmars-d mailing list