compile time compression for associatve array literal

Mon Aug 23 14:04:05 UTC 2021

On Monday, 23 August 2021 at 11:53:46 UTC, ag0aep6g wrote:
> On 23.08.21 08:14, Brian Tiffin wrote:
>>  From ~~a~~ little reading, it seems associative array literal 
>> initialization is still pending for global scope, but allowed 
>> in a module constructor?  *If I understood the skimming 
>> surface reading so far*.
>> 
>> ```d
>> immutable string[string] things;
>> static (this) {
>>     things = ["key1": "value 1", "key2": "value 2"];
>> }
>> ```
>
> (Typo: It's `static this()`.)
>
Yep, that's a typo.

>> Is there a magic incantation that could convert the values to 
>> a `std.zlib.compress`ed ubyte array, at compile time?  So the 
>> object code gets keys:compvals instead of the full string 
>> value?
>
> There's a big roadblock: std.zlib.compress cannot go through 
> CTFE, because the source code of zlib isn't available to the 
> compiler; it's not even D code.
>
> Maybe there's a CTFE-able compression library on dub. If not, 
> you can write your own function and run that through CTFE. 
> Example with simple run-length encoding:
>
> ----
> uint[] my_compress(string s)
> {
>     import std.algorithm: group;
>     import std.string: representation;
>     uint[] compressed;
>     foreach (c_n; group(s.representation))
>     {
>         compressed ~= [c_n[0], c_n[1]];
>     }
>     return compressed;
> }
>
> string my_uncompress(const(uint)[] compressed)
> {
>     import std.conv: to;
>     string uncompressed = "";
>     for (; compressed.length >= 2; compressed = compressed[2 .. 
> $])
>     {
>         foreach (i; 0 .. compressed[1])
>         {
>             uncompressed ~= compressed[0].to!char;
>         }
>     }
>     return uncompressed;
> }
>
> import std.array: replicate;
>
> /* CTFE compression: */
> enum compressed = my_compress("f" ~ "o".replicate(100_000) ~ 
> "bar");
>
> immutable string[string] things;
> shared static this()
> {
>     /* Runtime decompression: */
>     things = ["key1": my_uncompress(compressed)];
> }
> ----
>
> If you compile that, the object file should be far smaller than 
> 100,000 bytes, thanks to the compression.

Cool.  So, is might not be obvious, but there is a path to this 
little nicety.

>
> [...]
>> I'm not sure about
>> 
>> a) if code in a module constructor is even a candidate for 
>> CTFE?
>
> The word "candidate" might indicate a common misunderstanding 
> of CTFE. CTFE doesn't look for candidates. It's not an 
> optimization. The language dictates which values go through 
> CTFE.
>
> In a way, static constructors are the opposite of CTFE. 
> Initializers in module scope do go through CTFE. When you have 
> code that you cannot (or don't want to) put through CTFE, you 
> put it in a static constructor.
>
> You can still trigger CTFE within a static constructor by other 
> means (e.g., `enum`), but the static constructor itself is just 
> another function as far as CTFE is concerned.

Ok.  I'm hoping this gets easier to reason with once I get 
further up the D curve.

>
>> b) what a cast might look like to get a `q"DELIM ... DELIM"` 
>> delimited string for use as input to std.zlib.compress?
>
> A cast to get a string literal? That doesn't make sense.

No, no it doesn't.  And it didn't help that I had the order of AA 
key and value syntax backwards in my head when I was typing in 
the question.  I was thinking it was `key[value]`, not the proper 
`value[key]`.

So in this case, `(ubyte[])[string]` was what I *think* I'd be 
aiming for as the AA type spec.  The inputs to compress are 
`const(void)[]`, so I figured I needed to cast the type inferred 
literal delimited string for use in compress.  More things to 
learn.  ;-)

I cannot claim to be on solid ground of understanding when it 
comes to some areas of D syntax yet.

>
> You might be looking for `import("some_file")`. That gives you 
> the contents of a file as a string. You can then run that 
> string through your compression function in CTFE, put the 
> resulting compressed data into the object file, and decompress 
> it at runtime (like the example above does).

That's the goal.  It's an optional goal at this point.  I'm not 
*really* worried about size of object code, yet, but figured this 
would be a neat way to shrink the compiled code generated from 
some large COBOL source fragments embedded in D source.

COBOL programmer me might have planned to run the fragments 
through a compressor, then copy those outputs to the D source by 
hand, but that would be a maintenance headache and make for far 
less grokkable code.

Thanks for the hints, ag0aep6g.  You've given me some more paths 
to explore.

Have good.