Policy for exposing range structs
Liran Zvibel via Digitalmars-d
digitalmars-d at puremagic.com
Wed Mar 30 12:19:38 PDT 2016
On Sunday, 27 March 2016 at 17:01:39 UTC, David Nadlinger wrote:
> Compression in the usual sense won't help. Sure, it might
> reduce the object file size, but the full string will again
> have to be generated first, still requiring absurd amounts time
> and space. The latter is definitely not negligible for symbol
> names of several hundreds of kilobytes; it shows up prominently
> in compiler profiles of affected Weka builds.
We love Voldemort types at Weka, and use them a lot in our
non-gc-allocating ranges and algorithm libraries. Also, we
liberally nest templates inside of other templates.
I don't think we can do many of the things we do if we had to
define everything at module level. This flexibility is amazing
for us and part of the reason we love D.
But, as David said -- it comes with a great price for us.
I just processed our biggest executable, and came up with the
following numbers:
total symbols: 99649
Symbols longer than 1k: 9639
Symbols longer than 500k: 102
Symbols longer than 1M: 62. The longest symbols are about 5M
bytes!
This affects our exe sizes in a terrible way, and also increases
our compile and link times considerably. I will only be able to
come up with statistics of how much time was wasted due to
too-long-symbols after we fix it, but obviously this is a major
problem for us.
I think we should try the solution proposed by Anon, as it has a
good possibility of saving quite a bit.
It's important to make sure that when a template is given as a
template parameter, the complete template is treated as the LName.
Thinking about the compression idea by Andrei, I think we get
such long names since we have huge symbols that are being passed
as Voldemort names to template parameters. Then we repeat the
huge symbols several times in the new template.
Think of a .5M symbol passed few times to a template, this is
probably how we get to 5M size symbols.
This could end up being too complex, but if we assign "huffman
coding" like names to the complete template names in a module
scope (lets say, only if the template name is longer than 30
bytes), we then will be able to replace a very long string by the
huffman coded version coupled with the LName+Number idea above,
we will be able to shorten symbol names considerably.
An initial implementation could start with just the LName#
solution, and then we can see if we also have to recursively
couple it with huffman-coding of the results template names.
Liran
More information about the Digitalmars-d
mailing list