Program size, linking matter, and static this()

Martin Nowak dawg at dawgfoto.de
Fri Dec 16 22:27:27 PST 2011


On Sat, 17 Dec 2011 07:09:50 +0100, Martin Nowak <dawg at dawgfoto.de> wrote:

> On Fri, 16 Dec 2011 19:29:18 +0100, Andrei Alexandrescu  
> <SeeWebsiteForEmail at erdani.org> wrote:
>
>> Hello,
>>
>>
>> Late last night Walter and I figured a few interesting tidbits of  
>> information. Allow me to give some context, discuss them, and sketch a  
>> few approaches for improving things.
>>
>> A while ago Walter wanted to enable function-level linking, i.e. only  
>> get the needed functions from a given (and presumably large) module. So  
>> he arranged things that a library contains many small object "files"  
>> (that actually are generated from a single .d file and never exist on  
>> disk, only inside the library file, which can be considered an archive  
>> like tar). Then the linker would only pick the used object "files" from  
>> the library and link those in. Unfortunately that didn't have nearly  
>> the expected impact - essentially the size of most binaries stayed the  
>> same. The mystery was unsolved, and Walter needed to move on to other  
>> things.
>>
>> One particularly annoying issue is that even programs that don't  
>> ostensibly use anything from an imported module may balloon  
>> inexplicably in size. Consider:
>>
>> import std.path;
>> void main(){}
>>
>> This program, after stripping and all, has some 750KB in size. Removing  
>> the import line reduces the size to 218KB. That includes the runtime  
>> support, garbage collector, and such, and I'll consider it a baseline.  
>> (A similar but separate discussion could be focused on reducing the  
>> baseline size, but herein I'll consider it constant.)
>>
>> What we'd simply want is to be able to import stuff without blatantly  
>> paying for what we don't use. If a program imports std.path and uses no  
>> function from it, it should be as large as a program without the  
>> import. Furthermore, the increase should be incremental - using 2-3  
>> functions from std.path should only increase the executable size by a  
>> little, not suddenly link in all code in that module.
>>
>> But in experiments it seemed like program size would increase in sudden  
>> amounts when certain modules were included. After much investigation we  
>> figured that the following fateful causal sequence happened:
>>
>> 1. Some modules define static constructors with "static this()" or  
>> "static shared this()", and/or static destructors.
>>
>> 2. These constructors/destructors are linked in automatically whenever  
>> a module is included.
>>
>> 3. Importing a module with a static constructor (or destructor) will  
>> generate its ModuleInfo structure, which contains static information  
>> about all module members. In particular, it keeps virtual table  
>> pointers for all classes defined inside the module.
>>
>> 4. That means generating ModuleInfo refers all virtual functions  
>> defined in that module, whether they're used or not.
>>
>> 5. The phenomenon is transitive, e.g. even if std.path has no static  
>> constructors but imports std.datetime which does, a ModuleInfo is  
>> generated for std.path too, in addition to the one for std.datetime. So  
>> now classes inside std.path (if any) will be all linked in.
>>
>> 6. It follows that a module that defines classes which in turn use  
>> other functions in other modules, and has static constructors (or  
>> includes other modules that do) will baloon the size of the executable  
>> suddenly.
>>
>> There are a few approaches that we can use to improve the state of  
>> affairs.
>>
>> A. On the library side, use static constructors and destructors  
>> sparingly inside druntime and std. We can use lazy initialization  
>> instead of compulsively initializing library internals. I think this is  
>> often a worthy thing to do in any case (dynamic libraries etc) because  
>> it only does work if and when work needs to be done at the small cost  
>> of a check upon each use.
>>
>> B. On the compiler side, we could use a similar lazy initialization  
>> trick to only refer class methods in the module if they're actually  
>> needed. I'm being vague here because I'm not sure what and how that can  
>> be done.
>>
>> Here's a list of all files in std using static cdtors:
>>
>> std/__fileinit.d
>> std/concurrency.d
>> std/cpuid.d
>> std/cstream.d
>> std/datebase.d
>> std/datetime.d
>> std/encoding.d
>> std/internal/math/biguintcore.d
>> std/internal/math/biguintx86.d
>> std/internal/processinit.d
>> std/internal/windows/advapi32.d
>> std/mmfile.d
>> std/parallelism.d
>> std/perf.d
>> std/socket.d
>> std/stdiobase.d
>> std/uri.d
>>
>> The majority of them don't do a lot of work and are not much used  
>> inside phobos, so they don't blow up the executable. The main one that  
>> could receive some attention is std.datetime. It has a few static ctors  
>> and a lot of classes. Essentially just importing std.datetime or any  
>> std module that transitively imports std.datetime (and there are many  
>> of them) ends up linking in most of Phobos and blows the size up from  
>> the 218KB baseline to 700KB.
>>
>> Jonathan, could I impose on you to replace all static cdtors in  
>> std.datetime with lazy initialization? I looked through it and it  
>> strikes me as a reasonably simple job, but I think you'd know better  
>> what to do than me.
>>
>> A similar effort could be conducted to reduce or eliminate static  
>> cdtors from druntime. I made the experiment of commenting them all, and  
>> that reduced the size of the baseline from 218KB to 200KB. This is a  
>> good amount, but not as dramatic as what we can get by working on  
>> std.datetime.
>>
>>
>> Thanks,
>>
>> Andrei
>
> We'd need the linker to do anything of this. Unreferenced symbols should  
> be outputted using
> kind of vague linkage (multiobj partly does this).  
> I-reference-everything stuff link ModuleInfos
> should only create weak references. This includes that localClasses
More concrete if we'd output weak defined symbols (null) for what is  
referenced
by a ModuleInfo then the linker should not open further object files to
find a definition. But if another definition is linked in it will replace
the weak definition. The program would then need to skip the dummy symbols  
(null)
at runtime.

> might contain only
> part of the actual module. People can use the designated export  
> attribute to forcefully
> output unused symbols.


More information about the Digitalmars-d mailing list