Policy for exposing range structs

Anon via Digitalmars-d digitalmars-d at puremagic.com
Thu Mar 31 10:30:44 PDT 2016


On Thursday, 31 March 2016 at 16:46:42 UTC, Adam D. Ruppe wrote:
> On Thursday, 31 March 2016 at 16:38:59 UTC, Anon wrote:
>> I've been spending my D time thinking about potential changes 
>> to how template string value parameters are encoded.
>
>
> How does it compare to simply gzipping the string and writing 
> it out with base62?

My encoding is shorter in the typical use case, at least when 
using xz instead gzip. (xz was quicker/easier to get raw 
compressed data without a header.)

1= Raw UTF-8, 2= my encoder, 3= `echo -n "$1" | xz -Fraw | base64`

---
1. some_identifier
2. some_identifier_
3. AQA0c29tZV9pZGVudGlmaWVyAA==

1. /usr/include/d/std/stdio.d
2. usrincludedstdstdiod_jqacdhbd
3. AQAZL3Vzci9pbmNsdWRlL2Qvc3RkL3N0ZGlvLmQa

1. Hello, World!
2. HelloWorld_0far4i
3. AQAMSGVsbG8sIFdvcmxkIQA=

1. こんにちは世界
2. XtdCDr5mL02g3rv
3. AQAU44GT44KT44Gr44Gh44Gv5LiW55WMAA==
---

The problem is that compression isn't magical, and a string needs 
to be long enough and have enough repetition to compress well. If 
it isn't, compression causes the data to grow, and base64 
compounds that. For the sake of fairness, let's also do a larger 
(compressible) string.

Input: 1000 lines, each with the text "Hello World"

1. 12000 bytes
2. 12008 bytes
3. 94 bytes

However, my encoding is still fairly compressible, so we *could* 
route it through the same compression if/when a symbol is 
determined to be compressible. That yields 114 bytes.

The other thing I really like about my encoder is that plain C 
identifiers are left verbatim visible in the result. That would 
be especially nice with, e.g., opDispatch.

Would a hybrid approach (my encoding, optionally using 
compression when it would be advantageous) make sense? My encoder 
already has to process the whole string, so it could do some sort 
of analysis to estimate how compressible the result would be. I 
don't know what that would look like, but it could work.

Alternately, we could do the compression on whole mangled names, 
not just the string values, but I don't know how desirable that 
is.


More information about the Digitalmars-d mailing list