reggae v0.10.0 - The meta build system just got better

Mon Sep 18 01:00:20 UTC 2023

On Friday, 15 September 2023 at 20:22:50 UTC, Atila Neves wrote:
> An argument could be made that it could/should install the 
> dependencies such that only one `-I` flag is needed.

Indeed, this would be god tier.

> ~190k SLOC (not counting the many dub dependencies) killed dmd 
> on a system with 64GB RAM + 64GB swap after over a minute. Even 
> if it worked, it'd be much, much slower.

What you do with the lines of code is *far* more important than 
how many there are.

The arsd library has about 219,000 lines of text if you delete 
the Windows-only and obsolete modules (doing so just so I can 
actually dmd *.d here on my Linux box). This includes comments 
and such; dscanner --sloc reports about 98,000.

$ wc *.d
<snip>
  218983  870208 7134770 total
$ dscanner --sloc *.d
<snip>
total:  98645

Let's compile it all:

$ /usr/bin/time dmd *.d -L-L/usr/local/pgsql/lib  -unittest 
-L-lX11
5.35user 0.72system 0:06.08elapsed 99%CPU (0avgtext+0avgdata 
1852460maxresident)k
0inputs+70464outputs (0major+536358minor)pagefaults 0swaps

That's a little bit slow, over 5 seconds. About 1.3 of those 
seconds are spent in the linker, the other 4 remain with dmd -c. 
Similarly, that's almost 2 GB of RAM it used, more than it 
probably should, but it worked fine.

My computer btw is a budget model circa 2016. Nothing 
extraordinary about its hardware.

But notice it isn't actually running out of RAM or melting the 
CPU over a period of minutes, despite being about six figures 
lines of code but any measure.

On the other hand, compile:

enum a = () {
    string s;
    foreach(i; 0 .. 20_000_000_000)
      s ~= 'a';
    return s;
}();

Don't actually do it, but you can imagine what will happen. 6 
lines that can spin your cpu and explode your memory. Indeed, 
even just importing this module, even if the build system tried 
not to compile it again, will cause the same problem.

The arsd libs are written - for the most part, there's some 
exceptions - with compile speed in mind. If I see my build slow 
down, I investigate why. Most problems like this can be fixed!

In fact, let's take that snippet and talk about it. I had to 
remove *several* zeroes to make it even work without freezing up 
my computer, but with a 100,000 item loop, it just barely worked. 
Even 200,000 made it OOM.

But ok, a 100,000 item append:

0.53user 1.52system 0:02.17elapsed 95%CPU (0avgtext+0avgdata 
4912656maxresident)k

About 5 GB of RAM devoured by these few lines, taking 2 seconds 
to run. What are some ways we can fix this? The ~= operator is 
actually *awful* at CTFE, its behavior is quadratic (...or worse, 
i didn't confirm this today, but it is obviously bad). So you can 
fix this pretty easily:

enum string a = () {
    // preallocate the buffer instead of append
    char[] s = new char[](100000);
    foreach(ref ch; s)
      ch = 'a';
    return s;
}();

0.17user 0.03system 0:00.21elapsed 98%CPU (0avgtext+0avgdata 
45748maxresident)k 16inputs+1408outputs 
(0major+21995minor)pagefaults 0swaps

Over 10x faster to compile, 1/100th of the RAM, ram result. Real 
world code is frequently doing more than this example and 
rewriting it to work like this might take some real effort.... 
but the results are worth it.

And btw try this: import this module and check your time/memory 
stats. Even if it isn't compiled, since ctfe is run when the 
module is even just imported, you gain *nothing* by separate 
compilation!

...but there are times when you can gain a LOT by separate 
compilation in situations like this, if you can move the ctfe to 
be some private thing not exposed in the interface. This requires 
some work by the lib author too though in most cases. An example 
where you can gain a lot is when something does a lot of internal 
code generation but exposes a small interface, for example a 
scripting language wrapper (though script wrappers can also be 
made to compile reasonably efficiently if you use things like 
preallocation of buffers, keep your generated functions short 
(again, the codegen has quadratic behavior, so many small 
functions work better than a big one, and if you factor the code 
well, you can minimize the amount of generated code and call back 
to generic things, e.g. type erasure), collapse template 
instances, and keep ctfe things ctfe only with a variety of 
techniques, so they are not codegened unless they are actually 
necessary).

My arsd.script and arsd.cgi can wrap large numbers of functions 
and classes reasonably well, but that's why programs using them 
tend to be multi-second builds.... just note that's programs 
using them. Separate compiling the libraries doesn't help. You'd 
have to structure the code to keep those codegen parts internal 
to a package with a minimal interface, then separate compiling 
those internal components might win.

But this is a fairly niche case. Yes, I know there's one major 
commercial D user who do exactly this. But that's the exception, 
not the rule.