A modest proposal: eliminate template code bloat

Dmitry Olshansky dmitry.olsh at gmail.com
Sun Apr 8 04:01:56 PDT 2012


I think it's been ages since I meant to ask why nobody (as in compiler 
vendors) does what I think is rather simple optimization.

In the short term the plan is to introduce a "link-time" flavored 
optimization at code generation or (better) link step.

For simplicity let's assume compiler does all of the following during 
generation of an object file.

1. Every time a function is generated (or pretty much any symbol) not 
only a size calculated but also a checksum* of it's data.
(If we go for link-time optimization we should find a place to stick it 
to in the object file)

2. Compiler maintains a hash-table of symbol_size ---> list( ~ array) of 
pairs (references to data, checksum) of all symbols with given size. 
Call it a duplicate table. Every function generated and more generally 
global immutable data should end up there.

3. After any function was generated compiler checks an entry in the 
duplicate table that matches size, followed by matching checksum and 
only then (if required) doing a straight memcmp. If it happens that 
there is a match compiler just throws generated code away and _aliases_ 
it's symbol to that of a matched entry.
(so there has to be an alias table if there isn't one already)

*Yes, checksum. I think it should be real simple and easy to parallel 
hash function. The original checksum is no reliable but some amount 
balancing and profiling are obviously required when picking this function.

Applicability:
It's not only const-immutable bloat, it can be alleviated with inout. 
Yet there are plenty of places the exact same code is being generated: 
e.g. sealed containers of int vs uint, std.find on dchar[] vs 
uint[]/int[] an so on.
In general, the coarse grained parametrization is the root of all evil 
and it is inevitable since we are just humans after all.

Notes:
1. If we do checksum calculation on the fly during codegen it gets at 
virtually no cost as the data is in CPU data cache. Preliminary version 
can avoid hacking this part of backend though.

2. By _alias_ I mean the ability of compiler to emit references to a 
given symbol as if it was some other symbol (should be really straight 
forward).

3. Linker have more data and is able to achieve colossal size savings, 
essentially running through the same algorithm before actually linking 
things. Again it's easily separable step and can be an opt-in.

4. Ironically the same exact thing works with any kind of immutable data 
structures. It looks like string pooling is superseded by this proposal.

Thoughts?

-- 
Dmitry Olshansky


More information about the Digitalmars-d mailing list