Program size, linking matter, and static this()

Steven Schveighoffer schveiguy at yahoo.com
Fri Dec 16 11:23:22 PST 2011


On Fri, 16 Dec 2011 13:29:18 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail at erdani.org> wrote:

> Hello,
>
>
> Late last night Walter and I figured a few interesting tidbits of  
> information. Allow me to give some context, discuss them, and sketch a  
> few approaches for improving things.
>
> A while ago Walter wanted to enable function-level linking, i.e. only  
> get the needed functions from a given (and presumably large) module. So  
> he arranged things that a library contains many small object "files"  
> (that actually are generated from a single .d file and never exist on  
> disk, only inside the library file, which can be considered an archive  
> like tar). Then the linker would only pick the used object "files" from  
> the library and link those in. Unfortunately that didn't have nearly the  
> expected impact - essentially the size of most binaries stayed the same.  
> The mystery was unsolved, and Walter needed to move on to other things.
>
> One particularly annoying issue is that even programs that don't  
> ostensibly use anything from an imported module may balloon inexplicably  
> in size. Consider:
>
> import std.path;
> void main(){}
>
> This program, after stripping and all, has some 750KB in size. Removing  
> the import line reduces the size to 218KB. That includes the runtime  
> support, garbage collector, and such, and I'll consider it a baseline.  
> (A similar but separate discussion could be focused on reducing the  
> baseline size, but herein I'll consider it constant.)
>
> What we'd simply want is to be able to import stuff without blatantly  
> paying for what we don't use. If a program imports std.path and uses no  
> function from it, it should be as large as a program without the import.  
> Furthermore, the increase should be incremental - using 2-3 functions  
> from std.path should only increase the executable size by a little, not  
> suddenly link in all code in that module.
>
> But in experiments it seemed like program size would increase in sudden  
> amounts when certain modules were included. After much investigation we  
> figured that the following fateful causal sequence happened:
>
> 1. Some modules define static constructors with "static this()" or  
> "static shared this()", and/or static destructors.
>
> 2. These constructors/destructors are linked in automatically whenever a  
> module is included.
>
> 3. Importing a module with a static constructor (or destructor) will  
> generate its ModuleInfo structure, which contains static information  
> about all module members. In particular, it keeps virtual table pointers  
> for all classes defined inside the module.
>
> 4. That means generating ModuleInfo refers all virtual functions defined  
> in that module, whether they're used or not.
>
> 5. The phenomenon is transitive, e.g. even if std.path has no static  
> constructors but imports std.datetime which does, a ModuleInfo is  
> generated for std.path too, in addition to the one for std.datetime. So  
> now classes inside std.path (if any) will be all linked in.
>
> 6. It follows that a module that defines classes which in turn use other  
> functions in other modules, and has static constructors (or includes  
> other modules that do) will baloon the size of the executable suddenly.
>
> There are a few approaches that we can use to improve the state of  
> affairs.
>
> A. On the library side, use static constructors and destructors  
> sparingly inside druntime and std. We can use lazy initialization  
> instead of compulsively initializing library internals. I think this is  
> often a worthy thing to do in any case (dynamic libraries etc) because  
> it only does work if and when work needs to be done at the small cost of  
> a check upon each use.
>
> B. On the compiler side, we could use a similar lazy initialization  
> trick to only refer class methods in the module if they're actually  
> needed. I'm being vague here because I'm not sure what and how that can  
> be done.
>

I disagree with this assessment.  It's good to know the cause of the  
problem, but let's look at the root issue -- reflection.  The only reason  
to include class information for classes not being referenced is to be  
able to construct/use classes at runtime instead of at compile time.  But  
if you look at D's runtime reflection capabilities, they are quite poor.   
You can only construct a class at runtime if it has a zero-arg constructor.

So essentially, we are paying the penalty of having runtime reflection in  
terms of bloat, but get very very little benefit.

I think there are two things that need to be considered:

1. We eventually should have some reasonably complete runtime reflection  
capability
2. Runtime reflection and shared libraries go hand-in-hand.  With shared  
library support, the bloat penalty isn't nearly as significant.

I don't think the right answer is to avoid using features of the language  
because the compiler/runtime has some design deficiencies.  At some point  
these deficiencies will be fixed, and then we are left with a library that  
has seemingly odd design choices that we can't change.

-Steve


More information about the Digitalmars-d mailing list