Need a Faster Compressor
Anon via Digitalmars-d
digitalmars-d at puremagic.com
Tue May 24 12:16:07 PDT 2016
On Tuesday, 24 May 2016 at 16:22:56 UTC, Timon Gehr wrote:
> It's _exponential_ growth. We don't even want to spend the time
> and memory required to generate the strings.
>
> The reason we have this discussion is that the worst case isn't
> rare enough to make this argument. Why compress in the first
> place if mangled names don't grow large in practice?
Since Walter is evidently still having a hard time understanding
this, I've done a few more pathological cases, comparing LZ4 to
Back Referencing Named Types.
For BRNT I manually replaced struct names in the mangling with
N<number> numeric identifiers for all but one of their
appearances, to simulate what could actually be done by the
compiler. No other mangling changes (e.g., for eponymous
templates or hidden types) were applied.
auto foo(T)(T x)
{
static struct Vold { T payload; }
return Vold(x);
}
At a chain of length 10:
foo(5).foo.foo.foo.foo.foo.foo.foo.foo.foo
current: Generate a 57_581 byte symbol
lz4 -9: Generate a 57_581 byte symbol, then compress it to 412
bytes
lzma -9: Generate a 57_581 byte symbol, then compress it to 239
bytes
BRNT: Generate a 332 byte symbol
Upping it to twelve levels:
foo(5).foo.foo.foo.foo.foo.foo.foo.foo.foo.foo.foo
current: Generate a 230_489 byte symbol
lz4 -9: Generate a 230_489 byte symbol, then compress it to 1128
bytes
lzma -9: Generate a 230_489 byte symbol, then compress it to 294
bytes
BRNT: Generate a 393 byte symbol
Upping it to fourteen levels:
I'm too lazy to do more manual BRNT, so beyond this point its
numbers are estimated based on the previously established fact
that BRNT symbols grow linearly.
current: Generate a 922_127 byte symbol
lz4 -9: Generate a 922_127 byte symbol, then compress it to 4012
bytes
lzma -9: Generate a 922_127 byte symbol, compress it to 422 bytes
BRNT: Generate a ~457 byte symbol
Upping it to sixteen levels:
current: Generate a 3_688_679 byte symbol
lz4 -9: Generate a 3_688_679 byte symbol, then compress it to
15535 bytes
lzma -9: Generate a 3_688_679 byte symbol, then compress it to
840 bytes
BRNT: Generate a ~527 byte symbol
I want to let that sink in: in the general case, BRNT beats even
**LZMA**.
As if winning the compactness war while still being a symbol
linkers won't have problems with wasn't enough, this approach
also avoids generating giant symbols in the first place.
Compression and/or hashing cannot do that. If D really wants to,
it can compress symbols, but it *needs* to fix this problem
properly *first*.
More information about the Digitalmars-d
mailing list