Lib change leads to larger executables

Wed Feb 21 17:24:34 PST 2007

Walter Bright wrote:
> kris wrote:
> 
>> Walter Bright wrote:
>>
>>> Some strategies:
>>>
>>> 1) minimize importing of modules that are never used
>>>
>>> 2) for modules with a lot of code in them, import them as a .di file 
>>> rather than a .d
>>>
>>> 3) create a separate module that defines the relevant typeinfo's, and 
>>> put that first in the library
>>
>>
>>
>> 1) Tango takes this very seriously ... more so than Phobos, for example.
> 
> 
> Sure, but in this particular case, it seems that "core" is being 
> imported without referencing code in it. The only reason the compiler 
> doesn't generate the char[][] TypeInfo is because an import defines it. 
> The compiler does work on the assumption that if a module is imported, 
> then it will also be linked in.

This core module, and the entire locale package it resides in, is /not/ 
imported by anything. I spelled that out clearly before. You're making 
an assumption it is, somehow ... well, it is not. You can deduce that 
from the fact that the link succeeds perfectly well without that package 
existing in the library.

> 
>> 2) That is something that could be used in certain scenario's, but is 
>> not a general or practical solution for widespread use of D.
> 
> 
> The compiler can automatically generate .di files. You're probably going 
> to want to do that anyway as part of polishing the library - it speeds 
> compilation times, aids proper encapsulation, etc. That's why the gc 
> does it, and I've been meaning to do it for other bulky libraries like 
> std.regexp.

You may remember that many of us find .di files to be something "less" 
than an effective approach to library interfacing? As to it making 
smaller, faster compiliations -- try it on the Win32 header files ... it 
makes them bigger and noticably slower to parse.

This is neither a valid or practical solution.

> 
> I wish to point out that the current scheme does *work*, it generates 
> working executables. In the age of demand paged executable loading 
> (which both Linux and Windows do), unused code in the executable never 
> even gets loaded into memory. The downside to size is really in shipping 
> code over a network (also in embedded systems).
> 
> So I disagree with your characterization of it as impractical.

Oh, ok. It all depends on what one expects from a toolset. Good point

> 
> For professional libraries, it is not unreasonable to expect some extra 
> effort in tuning the libraries to minimize dependency. This is a normal 
> process, it's been going on at least for the 25 years I've been doing 
> it. Standard C runtime libraries, for example, have been *extensively* 
> tweaked and tuned in this manner, and that's just boring old C. They are 
> not just big lumps of code.
> 
>> 3) Hack around an undocumented and poorly understood problem in 
>> developer-land. Great.
> 
> 
> I think you understand the problem now, and the solution. Every 
> developer of professional libraries should understand this problem, it 
> crops up with most every language. If a developer doesn't understand it, 
> one winds up with something like Java where even the simplest hello 
> world winds up pulling in the entire Java runtime library, because 
> dependencies were not engineered properly.

This is a problem with the toolchain, Walter. Plain and simple. The 
linker picks up an arbitrary, yes arbitrary, module from the library 
because the D language-design is such that it highlights a deficiency in 
the existing toolchain. See below:

You can claim all you like that devs should learn to deal with it, but 
the fact remains that it took us more than a day to track down this 
obscure problem to being a char[][] decl. It will take just as long for 
the next one, and perhaps longer. Where does the cycle end?

The toolchain currently operates in a haphazard fashion, linking in 
/whatever/ module-chain happens to declare a typeinfo for char[][]. And 
it does this because of the way D generates the typeinfo. The process is 
broken, pure and simple. We should accept this and try to figure out how 
to resolve it instead.

> 
>> you might as well add:
>>
>> 4) have the user instantiate a pointless and magic char[][] in their 
>> own program, so that they can link with the Tango library?
> 
> 
> I wouldn't add it, as I would expect the library developer to take care 
> of such things by adding them to the Tango library as part of the 
> routine process of optimizing executable size by minimizing dependencies.
> 

Minimizing dependencies? What are you talking about? Those deps are 
produces purely by the D compiler, and not the code design.

>> None of this is not gonna fly in practice, and you surely know that?
> 
> 
> For features like runtime time identification, etc., that are generated 
> by the compiler (instead of explicitly by the programmer), then the 
> dependencies they generate are a fact of life.
> 
> Optimizing the size of a generated program is a routine programming 
> task. It isn't something new with D. I've been doing this for 25 years.

Entirely disingenuous. This is not about "optimization" at all ... it 
about a broken toolchain. Nothing more.

I hope you'll find a way to progress this forward toward a resolution 
instead of labeling it something else.