A modest proposal: eliminate template code bloat

Sun Apr 8 05:37:14 PDT 2012

Am Sun, 08 Apr 2012 15:01:56 +0400
schrieb Dmitry Olshansky <dmitry.olsh at gmail.com>:

> I think it's been ages since I meant to ask why nobody (as in compiler 
> vendors) does what I think is rather simple optimization.
> 
> In the short term the plan is to introduce a "link-time" flavored 
> optimization at code generation or (better) link step.
> 
> For simplicity let's assume compiler does all of the following during 
> generation of an object file.
> 
> 1. Every time a function is generated (or pretty much any symbol) not 
> only a size calculated but also a checksum* of it's data.
> (If we go for link-time optimization we should find a place to stick it 
> to in the object file)
> 
> 2. Compiler maintains a hash-table of symbol_size ---> list( ~ array) of 
> pairs (references to data, checksum) of all symbols with given size. 
> Call it a duplicate table. Every function generated and more generally 
> global immutable data should end up there.
> 
> 3. After any function was generated compiler checks an entry in the 
> duplicate table that matches size, followed by matching checksum and 
> only then (if required) doing a straight memcmp. If it happens that 
> there is a match compiler just throws generated code away and _aliases_ 
> it's symbol to that of a matched entry.
> (so there has to be an alias table if there isn't one already)
> 
> *Yes, checksum. I think it should be real simple and easy to parallel 
> hash function. The original checksum is no reliable but some amount 
> balancing and profiling are obviously required when picking this function.
> 
> Applicability:
> It's not only const-immutable bloat, it can be alleviated with inout. 
> Yet there are plenty of places the exact same code is being generated: 
> e.g. sealed containers of int vs uint, std.find on dchar[] vs 
> uint[]/int[] an so on.
> In general, the coarse grained parametrization is the root of all evil 
> and it is inevitable since we are just humans after all.
> 
> Notes:
> 1. If we do checksum calculation on the fly during codegen it gets at 
> virtually no cost as the data is in CPU data cache. Preliminary version 
> can avoid hacking this part of backend though.
> 
> 2. By _alias_ I mean the ability of compiler to emit references to a 
> given symbol as if it was some other symbol (should be really straight 
> forward).
> 
> 3. Linker have more data and is able to achieve colossal size savings, 
> essentially running through the same algorithm before actually linking 
> things. Again it's easily separable step and can be an opt-in.
> 
> 4. Ironically the same exact thing works with any kind of immutable data 
> structures. It looks like string pooling is superseded by this proposal.
> 
> Thoughts?

Thoughts? Nothing much. I thought of that a while ago, but as an external program, that finds function calls by disassembling and removing dead/duplicate code. So I agree with you. A similar feature is a CTFE cache (or general code cache) that checksums a function's source code and gets the compiled version from a cache.
Template bloat could be especially important to 'fix' on embedded systems. But I don't consider it important enough at the moment. :/   Let's wait till the bugs and important features are implemented or hack the compiler ourselves.

-- 
Marco