Need a Faster Compressor

Anon via Digitalmars-d digitalmars-d at puremagic.com
Tue May 24 12:16:07 PDT 2016


On Tuesday, 24 May 2016 at 16:22:56 UTC, Timon Gehr wrote:
> It's _exponential_ growth. We don't even want to spend the time 
> and memory required to generate the strings.
>
> The reason we have this discussion is that the worst case isn't 
> rare enough to make this argument. Why compress in the first 
> place if mangled names don't grow large in practice?

Since Walter is evidently still having a hard time understanding 
this, I've done a few more pathological cases, comparing LZ4 to 
Back Referencing Named Types.

For BRNT I manually replaced struct names in the mangling with 
N<number> numeric identifiers for all but one of their 
appearances, to simulate what could actually be done by the 
compiler. No other mangling changes (e.g., for eponymous 
templates or hidden types) were applied.

auto foo(T)(T x)
{
     static struct Vold { T payload; }
     return Vold(x);
}


At a chain of length 10:
foo(5).foo.foo.foo.foo.foo.foo.foo.foo.foo

current: Generate a 57_581 byte symbol
lz4 -9: Generate a 57_581 byte symbol, then compress it to 412 
bytes
lzma -9: Generate a 57_581 byte symbol, then compress it to 239 
bytes
BRNT: Generate a 332 byte symbol


Upping it to twelve levels:
foo(5).foo.foo.foo.foo.foo.foo.foo.foo.foo.foo.foo

current: Generate a 230_489 byte symbol
lz4 -9: Generate a 230_489 byte symbol, then compress it to 1128 
bytes
lzma -9: Generate a 230_489 byte symbol, then compress it to 294 
bytes
BRNT: Generate a 393 byte symbol

Upping it to fourteen levels:

I'm too lazy to do more manual BRNT, so beyond this point its 
numbers are estimated based on the previously established fact 
that BRNT symbols grow linearly.

current: Generate a 922_127 byte symbol
lz4 -9: Generate a 922_127 byte symbol, then compress it to 4012 
bytes
lzma -9: Generate a 922_127 byte symbol, compress it to 422 bytes
BRNT: Generate a ~457 byte symbol

Upping it to sixteen levels:

current: Generate a 3_688_679 byte symbol
lz4 -9: Generate a 3_688_679 byte symbol, then compress it to 
15535 bytes
lzma -9: Generate a 3_688_679 byte symbol, then compress it to 
840 bytes
BRNT: Generate a ~527 byte symbol



I want to let that sink in: in the general case, BRNT beats even 
**LZMA**.



As if winning the compactness war while still being a symbol 
linkers won't have problems with wasn't enough, this approach 
also avoids generating giant symbols in the first place. 
Compression and/or hashing cannot do that. If D really wants to, 
it can compress symbols, but it *needs* to fix this problem 
properly *first*.


More information about the Digitalmars-d mailing list