A modest proposal: eliminate template code bloat

H. S. Teoh hsteoh at quickfur.ath.cx
Sun Apr 8 07:18:26 PDT 2012


On Sun, Apr 08, 2012 at 03:01:56PM +0400, Dmitry Olshansky wrote:
> I think it's been ages since I meant to ask why nobody (as in
> compiler vendors) does what I think is rather simple optimization.
> 
> In the short term the plan is to introduce a "link-time" flavored
> optimization at code generation or (better) link step.

This would be incompatible with how current (non-dmd) linkers work. But
I do like the idea. Perhaps if it works well, other linkers will adopt
it? (Just like how the gcc linker adopted duplicate template code
elimination due to C++ templates.)


> For simplicity let's assume compiler does all of the following
> during generation of an object file.
> 
> 1. Every time a function is generated (or pretty much any symbol)
> not only a size calculated but also a checksum* of it's data.
> (If we go for link-time optimization we should find a place to stick
> it to in the object file)

We'd have to make sure the checksum doesn't end up in the final
executable though, otherwise the bloat may negate any gains we've made.


> 2. Compiler maintains a hash-table of symbol_size ---> list( ~
> array) of pairs (references to data, checksum) of all symbols with
> given size. Call it a duplicate table. Every function generated and
> more generally global immutable data should end up there.
> 
> 3. After any function was generated compiler checks an entry in the
> duplicate table that matches size, followed by matching checksum and
> only then (if required) doing a straight memcmp. If it happens that
> there is a match compiler just throws generated code away and
> _aliases_ it's symbol to that of a matched entry.
> (so there has to be an alias table if there isn't one already)

I think you don't even need an alias table; IIRC the OS dynamic linker
can easily handle symbols that have the same value (i.e. that point to
the same place). All you have to do is to change the value of the
duplicated symbols so that they all point to the same address.


[...]
> Applicability:
> It's not only const-immutable bloat, it can be alleviated with
> inout. Yet there are plenty of places the exact same code is being
> generated: e.g. sealed containers of int vs uint, std.find on
> dchar[] vs uint[]/int[] an so on.
> In general, the coarse grained parametrization is the root of all
> evil and it is inevitable since we are just humans after all.

I'm not sure I understand the last sentence there, but duplicate code
elimination is definitely a big plus. It will also mean that we can use
templates more freely without having to worry about template bloat.


> Notes:
> 1. If we do checksum calculation on the fly during codegen it gets
> at virtually no cost as the data is in CPU data cache. Preliminary
> version can avoid hacking this part of backend though.
> 
> 2. By _alias_ I mean the ability of compiler to emit references to a
> given symbol as if it was some other symbol (should be really
> straight forward).

Like I said, I think this isn't even necessary, if the compiler can just
generate the same value for the duplicated symbols.


> 3. Linker have more data and is able to achieve colossal size
> savings, essentially running through the same algorithm before
> actually linking things. Again it's easily separable step and can be
> an opt-in.

This assumes the (maybe external) linker knows how to take advantage of
the info. But IMO, linkers *should* be a lot smarter than they currently
are, so I don't see a problem with this as long as a dumb linker will
still produce a working executable (just without the space savings).

Alternatively we can have an external pre-link tool that scans a given
set of object files and eliminates duplicated code (by turning
duplicated symbols into external references in all but one of the
instances), before we hand the files off to the OS's native linker.
Come to think of it, this might be a good way to experiment with this
idea before we commit lots of effort into integrating it with a real
linker.


> 4. Ironically the same exact thing works with any kind of immutable
> data structures. It looks like string pooling is superseded by this
> proposal.
[...]

Not really... string pooling can take advantage of overlapping
(sub)strings, but I don't think you can do that with code. But I think
your idea has a lot of merit. I'm for making linkers smarter than they
currently are.


T

-- 
It always amuses me that Windows has a Safe Mode during bootup. Does
that mean that Windows is normally unsafe?


More information about the Digitalmars-d mailing list