Policy for exposing range structs
Steven Schveighoffer via Digitalmars-d
digitalmars-d at puremagic.com
Thu Mar 31 06:10:49 PDT 2016
On 3/30/16 3:19 PM, Liran Zvibel wrote:
> On Sunday, 27 March 2016 at 17:01:39 UTC, David Nadlinger wrote:
>> Compression in the usual sense won't help. Sure, it might reduce the
>> object file size, but the full string will again have to be generated
>> first, still requiring absurd amounts time and space. The latter is
>> definitely not negligible for symbol names of several hundreds of
>> kilobytes; it shows up prominently in compiler profiles of affected
>> Weka builds.
>
> We love Voldemort types at Weka, and use them a lot in our
> non-gc-allocating ranges and algorithm libraries. Also, we liberally
> nest templates inside of other templates.
> I don't think we can do many of the things we do if we had to define
> everything at module level. This flexibility is amazing for us and part
> of the reason we love D.
Voldemort types are what cause the bloat, templates inside templates
aren't as much of a problem. It's because the Voldemort type has to
include in its symbol name at least twice, and I think 3 times actually
(given the return type), the template parameter/function parameter types
of the function it resides in. If the template is just a template, it's
just included once. This is why moving the type outside the function is
effective at mitigation. It's linear growth vs. exponential.
I too like Voldemort types, but I actually found moving the types
outside the functions quite straightforward. It's just annoying to have
to repeat the template parameters. If you make them private, then you
can simply avoid all the constraints. It's a bad leak of implementation,
since now anything in the file has access to that type directly, but
it's better than the issues with voldemort types.
See the update to my iopipe library here:
https://github.com/schveiguy/iopipe/commit/1b0696dc82fce500c6b314ec3d8e5e11e0c1bcd7
This one commit made my example program 'convert'
(https://github.com/schveiguy/iopipe/blob/master/examples/convert/convert.d)
save over 90% binary size (went from 10MB to <1MB).
This also calmed down some REALLY horrible stack traces when I was
debugging. As in, I could actually understand what function it was
talking about, and it didn't take 10 seconds to print stack trace.
>
> But, as David said -- it comes with a great price for us.
>
> I just processed our biggest executable, and came up with the following
> numbers:
> total symbols: 99649
> Symbols longer than 1k: 9639
> Symbols longer than 500k: 102
> Symbols longer than 1M: 62. The longest symbols are about 5M bytes!
>
> This affects our exe sizes in a terrible way, and also increases our
> compile and link times considerably. I will only be able to come up with
> statistics of how much time was wasted due to too-long-symbols after we
> fix it, but obviously this is a major problem for us.
From my testing, it doesn't take much to get to the point where the
linker is unusable. A simple struct when nested in 15 calls to a
function makes the linker take an unreasonable amount of time (over 1.5
minutes, I didn't wait to see how long). See my bug report for details.
Another factor in the name length is the module name which is included
in every type and function. So you have a factor like 3^15 for the name,
but then you multiply this by the module names as well.
> I think we should try the solution proposed by Anon, as it has a good
> possibility of saving quite a bit.
> It's important to make sure that when a template is given as a template
> parameter, the complete template is treated as the LName.
I hope this is given serious thought, looks like someone has already
started implementation.
Anon, it appears that your mechanism has been well received by a few
knowledgeable people here. I encourage you to solidify your proposal in
a DIP (D improvement proposal) here: http://wiki.dlang.org/DIPs.
-Steve
More information about the Digitalmars-d
mailing list