compiled code file size

Manu turkeyman at gmail.com
Fri Sep 20 23:04:10 PDT 2013


On 21 September 2013 09:02, H. S. Teoh <hsteoh at quickfur.ath.cx> wrote:

> On Fri, Sep 20, 2013 at 05:04:23PM -0400, Nick Sabalausky wrote:
> > On Fri, 20 Sep 2013 21:45:48 +0200
> > "Temtaime" <temtaime at gmail.com> wrote:
> > >
> > > Software MUST running almost ANYWHERE and consumes minimal
> > > resources.
> > >
> > > For example i hate 3dsmax developers when on my game's map it uses
> > > several GB of ram amd freezes sometimes, when Blender uses only 500
> > > MB and runs fast. The only reason for me for use 3dsmax is more
> > > friendly contoling. But this is another story...
> > >
> > > Some users which doesn't have ""modern"" PC will hate your app too i
> > > think.  One should optimize ALL things which he can to optimize.
> > >
> >
> > I agree with what you're saying here, but the problem is we're looking
> > at a difference of only a few hundred k.
> >
> > Heck, my primary PC was a 32-bit single-core right up until last year
> > (and I still use it as a secondary system), and I didn't care one bit
> > if a hello world was 1k or 1MB.
>
> I agree with the OP that dmd should improve dead-code culling, though.
> Recently Walter has started doing lazy template instantiation for
> imports, which begins to trim off some of the fat. But there's plenty of
> room for more improvements.
>
> For example, after seeing Walter's recent pulls, I got inspired to write
> a simple utility that takes the output of objdump -d (the disassembly of
> an executable) and parses it to extract code symbols from the program
> along with references to other symbols. It then builds of graph of how
> symbols reference each other, and performs some trivial reachability
> analysis on it. It revealed some startling results... like the fact that
> symbols from std.complex are included in a hello world program, even
> though complex numbers are never used!
>
> The ratio of total number of symbols to symbols transitively reachable
> from _Dmain is rather large, ranging from 5 (medium-sized, complex
> program) to about 30 (a hello world program). Now I'm not 100% confident
> about the accuracy of these numbers, since some symbols may be
> indirectly referenced, and thus missed in the graph built from parsing
> the disassembly. But still, even when taken as ballpark figures, it
> shows that there's a *lot* of room for improvement. Certainly, some of
> the unreferenced symbols are druntime overhead (used by startup/exit
> functions, etc.), but a ratio of *5*? That's a 5x executable size bloat.
> Even if we discount half of that for druntime overhead and indirect
> references... I mean, how many indirect references can you have?  I
> really can't convince myself that's "merely" druntime/phobos overhead.
> Especially when I see symbols from std.complex in a program that doesn't
> even use complex numbers. std.complex shouldn't be in there in the first
> place, before we even talk about template bloat.
>
>
> > How many real world programs are as trivial as a hello world? A few
> > maybe, but not many. Certainly not enough to actually add up to
> > anything significant, unless maybe you happen to be running on a 286 or
> > such.
> >
> > If we were talking about real-world D programs taking tens/hundreds of
> > MB more than they should, then that would be a problem. But they
> > don't. We're just talking about a few hundred k for an *entire* program.
>
> My numbers show otherwise. :) Well, OK, I'm counting symbols rather than
> size, and the count may not be 100% accurate. But it does show that we
> could improve. By a lot.
>
> A hello world program, according to my test, has a ratio of 30 between
> total symbols and symbols reachable from _Dmain, whereas a medium-sized
> complex program shows a ratio of around 5 (the symbol analyser program
> itself, which is significantly simpler than the complex program I
> tested, also shows a ratio of 5). So we can probably discount the hello
> world case, since most of the apparent bloat is probably just one-off
> overhead from druntime, etc.. But the ratio of 5 for non-trivial
> programs? No matter how I try to rationalize it, I'm forced to conclude
> that there is a lot of room for improvement here. Surely *some*
> significant subset of these unreferenced symbols must be actually
> unreachable and can be pruned from the executable.
>
> I'll continue refining the analysis while Walter works on more lazy
> instantiations for imports. I'm expecting to see a lot of improvements
> in this area. :)
>

This is awesome.
What would be really awesome is if you integrated this into the D
auto-builder, and hack it publish the results somewhere for the latest
build.
It would be good to know when people write code that results in a
significant increase in coverage (particularly when it doesn't need to).
It would also provide very useful information for hackers who just want to
get in and do some work to try and trim it a bit.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20130921/14e67fee/attachment-0001.html>


More information about the Digitalmars-d mailing list