Policy for exposing range structs

Liran Zvibel via Digitalmars-d digitalmars-d at puremagic.com
Wed Mar 30 12:19:38 PDT 2016


On Sunday, 27 March 2016 at 17:01:39 UTC, David Nadlinger wrote:
> Compression in the usual sense won't help. Sure, it might 
> reduce the object file size, but the full string will again 
> have to be generated first, still requiring absurd amounts time 
> and space. The latter is definitely not negligible for symbol 
> names of several hundreds of kilobytes; it shows up prominently 
> in compiler profiles of affected Weka builds.

We love Voldemort types at Weka, and use them a lot in our 
non-gc-allocating ranges and algorithm libraries. Also, we 
liberally nest templates inside of other templates.
I don't think we can do many of the things we do if we had to 
define everything at module level. This flexibility is amazing 
for us and part of the reason we love D.

But, as David said -- it comes with a great price for us.

I just processed our biggest executable, and came up with the 
following numbers:
total symbols: 99649
Symbols longer than 1k: 9639
Symbols longer than 500k: 102
Symbols longer than 1M: 62. The longest symbols are about 5M 
bytes!

This affects our exe sizes in a terrible way, and also increases 
our compile and link times considerably. I will only be able to 
come up with statistics of how much time was wasted due to 
too-long-symbols after we fix it, but obviously this is a major 
problem for us.

I think we should try the solution proposed by Anon, as it has a 
good possibility of saving quite a bit.
It's important to make sure that when a template is given as a 
template parameter, the complete template is treated as the LName.

Thinking about the compression idea by Andrei, I think we get 
such long names since we have huge symbols that are being passed 
as Voldemort names to template parameters. Then we repeat the 
huge symbols several times in the new template.
Think of a .5M symbol passed few times to a template, this is 
probably how we get to 5M size symbols.
This could end up being too complex, but if we assign "huffman 
coding" like names to the complete template names in a module 
scope (lets say, only if the template name is longer than 30 
bytes), we then will be able to replace a very long string by the 
huffman coded version coupled with the LName+Number idea above, 
we will be able to shorten symbol names considerably.

An initial implementation could start with just the LName# 
solution, and then we can see if we also have to recursively 
couple it with huffman-coding of the results template names.

Liran


More information about the Digitalmars-d mailing list