Program size, linking matter, and static this()
Steven Schveighoffer
schveiguy at yahoo.com
Fri Dec 16 11:23:22 PST 2011
On Fri, 16 Dec 2011 13:29:18 -0500, Andrei Alexandrescu
<SeeWebsiteForEmail at erdani.org> wrote:
> Hello,
>
>
> Late last night Walter and I figured a few interesting tidbits of
> information. Allow me to give some context, discuss them, and sketch a
> few approaches for improving things.
>
> A while ago Walter wanted to enable function-level linking, i.e. only
> get the needed functions from a given (and presumably large) module. So
> he arranged things that a library contains many small object "files"
> (that actually are generated from a single .d file and never exist on
> disk, only inside the library file, which can be considered an archive
> like tar). Then the linker would only pick the used object "files" from
> the library and link those in. Unfortunately that didn't have nearly the
> expected impact - essentially the size of most binaries stayed the same.
> The mystery was unsolved, and Walter needed to move on to other things.
>
> One particularly annoying issue is that even programs that don't
> ostensibly use anything from an imported module may balloon inexplicably
> in size. Consider:
>
> import std.path;
> void main(){}
>
> This program, after stripping and all, has some 750KB in size. Removing
> the import line reduces the size to 218KB. That includes the runtime
> support, garbage collector, and such, and I'll consider it a baseline.
> (A similar but separate discussion could be focused on reducing the
> baseline size, but herein I'll consider it constant.)
>
> What we'd simply want is to be able to import stuff without blatantly
> paying for what we don't use. If a program imports std.path and uses no
> function from it, it should be as large as a program without the import.
> Furthermore, the increase should be incremental - using 2-3 functions
> from std.path should only increase the executable size by a little, not
> suddenly link in all code in that module.
>
> But in experiments it seemed like program size would increase in sudden
> amounts when certain modules were included. After much investigation we
> figured that the following fateful causal sequence happened:
>
> 1. Some modules define static constructors with "static this()" or
> "static shared this()", and/or static destructors.
>
> 2. These constructors/destructors are linked in automatically whenever a
> module is included.
>
> 3. Importing a module with a static constructor (or destructor) will
> generate its ModuleInfo structure, which contains static information
> about all module members. In particular, it keeps virtual table pointers
> for all classes defined inside the module.
>
> 4. That means generating ModuleInfo refers all virtual functions defined
> in that module, whether they're used or not.
>
> 5. The phenomenon is transitive, e.g. even if std.path has no static
> constructors but imports std.datetime which does, a ModuleInfo is
> generated for std.path too, in addition to the one for std.datetime. So
> now classes inside std.path (if any) will be all linked in.
>
> 6. It follows that a module that defines classes which in turn use other
> functions in other modules, and has static constructors (or includes
> other modules that do) will baloon the size of the executable suddenly.
>
> There are a few approaches that we can use to improve the state of
> affairs.
>
> A. On the library side, use static constructors and destructors
> sparingly inside druntime and std. We can use lazy initialization
> instead of compulsively initializing library internals. I think this is
> often a worthy thing to do in any case (dynamic libraries etc) because
> it only does work if and when work needs to be done at the small cost of
> a check upon each use.
>
> B. On the compiler side, we could use a similar lazy initialization
> trick to only refer class methods in the module if they're actually
> needed. I'm being vague here because I'm not sure what and how that can
> be done.
>
I disagree with this assessment. It's good to know the cause of the
problem, but let's look at the root issue -- reflection. The only reason
to include class information for classes not being referenced is to be
able to construct/use classes at runtime instead of at compile time. But
if you look at D's runtime reflection capabilities, they are quite poor.
You can only construct a class at runtime if it has a zero-arg constructor.
So essentially, we are paying the penalty of having runtime reflection in
terms of bloat, but get very very little benefit.
I think there are two things that need to be considered:
1. We eventually should have some reasonably complete runtime reflection
capability
2. Runtime reflection and shared libraries go hand-in-hand. With shared
library support, the bloat penalty isn't nearly as significant.
I don't think the right answer is to avoid using features of the language
because the compiler/runtime has some design deficiencies. At some point
these deficiencies will be fixed, and then we are left with a library that
has seemingly odd design choices that we can't change.
-Steve
More information about the Digitalmars-d
mailing list