A modest proposal: eliminate template code bloat
Dmitry Olshansky
dmitry.olsh at gmail.com
Sun Apr 8 11:49:43 PDT 2012
On 08.04.2012 21:24, H. S. Teoh wrote:
>
> Yeah, that's what I was thinking of. This would be a very big gain for
> the new AA implementation, for example. I wouldn't have to worry so much
> about template bloat if most of the instantiations are going to get
> merged anyway. :-)
>
Right the advantage is that one doesn't have to use tricks. One can
simply assume the compiler is doing it's job. That's what happened with
inlining some 10+ years ago.
>
> [...]
>>> Not really... string pooling can take advantage of overlapping
>>> (sub)strings, but I don't think you can do that with code. But I
>>> think your idea has a lot of merit. I'm for making linkers smarter
>>> than they currently are.
>>>
>>
>> Sorry, it's just me running ahead of train somewhat. Basically once
>> this initial version is in place I have one cool refinement for it in
>> mind. For now we just need to keep the hash function transitive and
>> associative for the gods sake. 128/64bit checksum please ;)
> [...]
>
> And what would the cool refinement be? :-)
>
>
OK, you sort of forced my hand.
I hope you have been warned about spoilers before ;)
The refinement is merging prefixes and suffixes of course.
And for that one needs to calculate hashes for all of prefixes and all
of suffixes. I will define _all_ later on.
First observation is that if you calculated partial checksums for
prefixes you have sums for all suffixes and vise-versa.
Namely:
//prefix ends on i, suffix start on i
sum_prefix[i] = total - sum_suffix[i];
that is derived from the fact that:
total = sum_prefix[i] + sum_suffix[i];
(taking into account the properties of +/- in the finite field)
Now take the original idea, substitute checksum with an array of partial
sums and lengths (btw lengths follow the same rule of prefix<->suffix)
and you know what this refinement all about.
In fact this all was easy, the trick is fitting this into the time frame
of compilation.
To that end one should do full duplicate elimination first then mess
with merging.
Then we see that generally we are about to do O((M1*...*Mn)^2) work
where n - is number of functions, Mi - number of prefixes (= suffixes)
we defined for each. The constant C in front of (M1...Mn)^2 here is not
that big - it's comparing checksums & lengths but *sometimes* memcmp
still keep in mind that the number of all functions is big.
So we have to get around this monstrous workload. At the moment I see
the only way is doing this is use coarse grained prefixes and introduce
some threshold on maximum number of prefixes we account for. That about
defining above mentioned _all_ for prefixes.
Coarse grained means we do store partial checksums on a fixed block
boundary (say every 16 or 32 bytes) if that aligns properly with
instructions if not we just skip this prefix and move on.
(this also hopefully limits memory usage on partial sums array)
Another threshold is that we don't mess with partial sums for really BIG
functions. (might be not needed)
I think there is some space for other heuristics to try out but they are
mostly along the same lines.
P.S. Damn, I could have done a nice paper on that... too late :)
--
Dmitry Olshansky
More information about the Digitalmars-d
mailing list