reggae v0.10.0 - The meta build system just got better
Adam D Ruppe
destructionator at gmail.com
Mon Sep 18 01:00:20 UTC 2023
On Friday, 15 September 2023 at 20:22:50 UTC, Atila Neves wrote:
> An argument could be made that it could/should install the
> dependencies such that only one `-I` flag is needed.
Indeed, this would be god tier.
> ~190k SLOC (not counting the many dub dependencies) killed dmd
> on a system with 64GB RAM + 64GB swap after over a minute. Even
> if it worked, it'd be much, much slower.
What you do with the lines of code is *far* more important than
how many there are.
The arsd library has about 219,000 lines of text if you delete
the Windows-only and obsolete modules (doing so just so I can
actually dmd *.d here on my Linux box). This includes comments
and such; dscanner --sloc reports about 98,000.
$ wc *.d
<snip>
218983 870208 7134770 total
$ dscanner --sloc *.d
<snip>
total: 98645
Let's compile it all:
$ /usr/bin/time dmd *.d -L-L/usr/local/pgsql/lib -unittest
-L-lX11
5.35user 0.72system 0:06.08elapsed 99%CPU (0avgtext+0avgdata
1852460maxresident)k
0inputs+70464outputs (0major+536358minor)pagefaults 0swaps
That's a little bit slow, over 5 seconds. About 1.3 of those
seconds are spent in the linker, the other 4 remain with dmd -c.
Similarly, that's almost 2 GB of RAM it used, more than it
probably should, but it worked fine.
My computer btw is a budget model circa 2016. Nothing
extraordinary about its hardware.
But notice it isn't actually running out of RAM or melting the
CPU over a period of minutes, despite being about six figures
lines of code but any measure.
On the other hand, compile:
enum a = () {
string s;
foreach(i; 0 .. 20_000_000_000)
s ~= 'a';
return s;
}();
Don't actually do it, but you can imagine what will happen. 6
lines that can spin your cpu and explode your memory. Indeed,
even just importing this module, even if the build system tried
not to compile it again, will cause the same problem.
The arsd libs are written - for the most part, there's some
exceptions - with compile speed in mind. If I see my build slow
down, I investigate why. Most problems like this can be fixed!
In fact, let's take that snippet and talk about it. I had to
remove *several* zeroes to make it even work without freezing up
my computer, but with a 100,000 item loop, it just barely worked.
Even 200,000 made it OOM.
But ok, a 100,000 item append:
0.53user 1.52system 0:02.17elapsed 95%CPU (0avgtext+0avgdata
4912656maxresident)k
About 5 GB of RAM devoured by these few lines, taking 2 seconds
to run. What are some ways we can fix this? The ~= operator is
actually *awful* at CTFE, its behavior is quadratic (...or worse,
i didn't confirm this today, but it is obviously bad). So you can
fix this pretty easily:
enum string a = () {
// preallocate the buffer instead of append
char[] s = new char[](100000);
foreach(ref ch; s)
ch = 'a';
return s;
}();
0.17user 0.03system 0:00.21elapsed 98%CPU (0avgtext+0avgdata
45748maxresident)k 16inputs+1408outputs
(0major+21995minor)pagefaults 0swaps
Over 10x faster to compile, 1/100th of the RAM, ram result. Real
world code is frequently doing more than this example and
rewriting it to work like this might take some real effort....
but the results are worth it.
And btw try this: import this module and check your time/memory
stats. Even if it isn't compiled, since ctfe is run when the
module is even just imported, you gain *nothing* by separate
compilation!
...but there are times when you can gain a LOT by separate
compilation in situations like this, if you can move the ctfe to
be some private thing not exposed in the interface. This requires
some work by the lib author too though in most cases. An example
where you can gain a lot is when something does a lot of internal
code generation but exposes a small interface, for example a
scripting language wrapper (though script wrappers can also be
made to compile reasonably efficiently if you use things like
preallocation of buffers, keep your generated functions short
(again, the codegen has quadratic behavior, so many small
functions work better than a big one, and if you factor the code
well, you can minimize the amount of generated code and call back
to generic things, e.g. type erasure), collapse template
instances, and keep ctfe things ctfe only with a variety of
techniques, so they are not codegened unless they are actually
necessary).
My arsd.script and arsd.cgi can wrap large numbers of functions
and classes reasonably well, but that's why programs using them
tend to be multi-second builds.... just note that's programs
using them. Separate compiling the libraries doesn't help. You'd
have to structure the code to keep those codegen parts internal
to a package with a minimal interface, then separate compiling
those internal components might win.
But this is a fairly niche case. Yes, I know there's one major
commercial D user who do exactly this. But that's the exception,
not the rule.
More information about the Digitalmars-d-announce
mailing list