Policy for exposing range structs
Anon via Digitalmars-d
digitalmars-d at puremagic.com
Thu Mar 31 10:30:44 PDT 2016
On Thursday, 31 March 2016 at 16:46:42 UTC, Adam D. Ruppe wrote:
> On Thursday, 31 March 2016 at 16:38:59 UTC, Anon wrote:
>> I've been spending my D time thinking about potential changes
>> to how template string value parameters are encoded.
>
>
> How does it compare to simply gzipping the string and writing
> it out with base62?
My encoding is shorter in the typical use case, at least when
using xz instead gzip. (xz was quicker/easier to get raw
compressed data without a header.)
1= Raw UTF-8, 2= my encoder, 3= `echo -n "$1" | xz -Fraw | base64`
---
1. some_identifier
2. some_identifier_
3. AQA0c29tZV9pZGVudGlmaWVyAA==
1. /usr/include/d/std/stdio.d
2. usrincludedstdstdiod_jqacdhbd
3. AQAZL3Vzci9pbmNsdWRlL2Qvc3RkL3N0ZGlvLmQa
1. Hello, World!
2. HelloWorld_0far4i
3. AQAMSGVsbG8sIFdvcmxkIQA=
1. こんにちは世界
2. XtdCDr5mL02g3rv
3. AQAU44GT44KT44Gr44Gh44Gv5LiW55WMAA==
---
The problem is that compression isn't magical, and a string needs
to be long enough and have enough repetition to compress well. If
it isn't, compression causes the data to grow, and base64
compounds that. For the sake of fairness, let's also do a larger
(compressible) string.
Input: 1000 lines, each with the text "Hello World"
1. 12000 bytes
2. 12008 bytes
3. 94 bytes
However, my encoding is still fairly compressible, so we *could*
route it through the same compression if/when a symbol is
determined to be compressible. That yields 114 bytes.
The other thing I really like about my encoder is that plain C
identifiers are left verbatim visible in the result. That would
be especially nice with, e.g., opDispatch.
Would a hybrid approach (my encoding, optionally using
compression when it would be advantageous) make sense? My encoder
already has to process the whole string, so it could do some sort
of analysis to estimate how compressible the result would be. I
don't know what that would look like, but it could work.
Alternately, we could do the compression on whole mangled names,
not just the string values, but I don't know how desirable that
is.
More information about the Digitalmars-d
mailing list