Policy for exposing range structs

Steven Schveighoffer via Digitalmars-d digitalmars-d at puremagic.com
Thu Mar 31 06:10:49 PDT 2016


On 3/30/16 3:19 PM, Liran Zvibel wrote:
> On Sunday, 27 March 2016 at 17:01:39 UTC, David Nadlinger wrote:
>> Compression in the usual sense won't help. Sure, it might reduce the
>> object file size, but the full string will again have to be generated
>> first, still requiring absurd amounts time and space. The latter is
>> definitely not negligible for symbol names of several hundreds of
>> kilobytes; it shows up prominently in compiler profiles of affected
>> Weka builds.
>
> We love Voldemort types at Weka, and use them a lot in our
> non-gc-allocating ranges and algorithm libraries. Also, we liberally
> nest templates inside of other templates.
> I don't think we can do many of the things we do if we had to define
> everything at module level. This flexibility is amazing for us and part
> of the reason we love D.

Voldemort types are what cause the bloat, templates inside templates 
aren't as much of a problem. It's because the Voldemort type has to 
include in its symbol name at least twice, and I think 3 times actually 
(given the return type), the template parameter/function parameter types 
of the function it resides in. If the template is just a template, it's 
just included once. This is why moving the type outside the function is 
effective at mitigation. It's linear growth vs. exponential.

I too like Voldemort types, but I actually found moving the types 
outside the functions quite straightforward. It's just annoying to have 
to repeat the template parameters. If you make them private, then you 
can simply avoid all the constraints. It's a bad leak of implementation, 
since now anything in the file has access to that type directly, but 
it's better than the issues with voldemort types.

See the update to my iopipe library here: 
https://github.com/schveiguy/iopipe/commit/1b0696dc82fce500c6b314ec3d8e5e11e0c1bcd7

This one commit made my example program 'convert' 
(https://github.com/schveiguy/iopipe/blob/master/examples/convert/convert.d) 
save over 90% binary size (went from 10MB to <1MB).

This also calmed down some REALLY horrible stack traces when I was 
debugging. As in, I could actually understand what function it was 
talking about, and it didn't take 10 seconds to print stack trace.

>
> But, as David said -- it comes with a great price for us.
>
> I just processed our biggest executable, and came up with the following
> numbers:
> total symbols: 99649
> Symbols longer than 1k: 9639
> Symbols longer than 500k: 102
> Symbols longer than 1M: 62. The longest symbols are about 5M bytes!
>
> This affects our exe sizes in a terrible way, and also increases our
> compile and link times considerably. I will only be able to come up with
> statistics of how much time was wasted due to too-long-symbols after we
> fix it, but obviously this is a major problem for us.

 From my testing, it doesn't take much to get to the point where the 
linker is unusable. A simple struct when nested in 15 calls to a 
function makes the linker take an unreasonable amount of time (over 1.5 
minutes, I didn't wait to see how long). See my bug report for details.

Another factor in the name length is the module name which is included 
in every type and function. So you have a factor like 3^15 for the name, 
but then you multiply this by the module names as well.

> I think we should try the solution proposed by Anon, as it has a good
> possibility of saving quite a bit.
> It's important to make sure that when a template is given as a template
> parameter, the complete template is treated as the LName.

I hope this is given serious thought, looks like someone has already 
started implementation.

Anon, it appears that your mechanism has been well received by a few 
knowledgeable people here. I encourage you to solidify your proposal in 
a DIP (D improvement proposal) here: http://wiki.dlang.org/DIPs.

-Steve


More information about the Digitalmars-d mailing list