D build and SCons

Fri Feb 2 22:01:04 UTC 2018

On Thu, Feb 01, 2018 at 12:56:28PM +0000, Russel Winder wrote:
[...]
> Apologies for taking so long to get to this.

Not a problem, you and I are both busy, and it's perfectly
understandable that we can't respond to things instantly.

> On Thu, 2017-12-28 at 10:21 -0800, H. S. Teoh via Digitalmars-d wrote:
[...]
> > OK, I may have worded things poorly here.  What I meant was that
> > with "traditional" build systems like make or SCons, whenever you
> > needed to rebuild the source tree, the tool has to scan the *entire*
> > source tree in order to discover what needs to be rebuilt. I.e.,
> > it's O(N) where N is the size of the source tree.  Whereas with tup,
> > it uses the Linux kernel's inotify mechanism to learn about which
> > file(s) being monitored have been changed since the last invocation,
> > so that it can scan the changed files in O(n) time where n is the
> > number of changed files, and in the usual case, n is much smaller
> > than N. It's still linear in terms of the size of the change, but
> > sublinear in terms of the size of the entire source tree.
> 
> This I can agree with. SCons definitely has to check hashes to
> determine which files have changed in a "not just space change" way on
> the leaves of the build ADG. I am not sure what Ninja does, but yes
> Tup uses inotify to filter the list of touched, but not necessarily
> changed, files. For my projects build time generally dominates check
> time so I don't see much difference. Except that Ninja is way faster
> than Make as a backend to CMake.

In small projects like my personal ones, SCons still does a fast enough
job that I don't really care about the difference between O(N) and O(n).
So I still use SCons for them -- SCons does have a really nice interface
and is generally pleasant to work with, so I don't feel an immediate
need to improve the build system.

But in a large project, like the one I work with at my job, containing
500,000 source files (not counting data files and the like that also
need to be processed by the build system), the difference can become
very pronounced.  In our case, we use make, which doesn't scan file
contents, and recursive make at that, so the initial scanning pause is
not noticeable. However, this perceived speed comes at the heavy cost of
reliability. On more occasions than I'd wish anyone else to experience,
I've had problems with faulty software builds caused not by actual bugs
in the code, but merely by make not rebuilding something when it should
be, or not cleaning up stray stale files when it should, causing stale
object files to be linked instead of the real objects.  It has simply
become an accepted fact of life to `make clean; make`. Well actually,
it's even worse than that -- our `make clean` does *not* clean
everything that might potentially be a problem, so I have for the last
≥5 years resorted to a script that manually deletes everything that
isn't under version control.  (What makes it even sadder is that the
version control server is overloaded and it's faster to delete files
locally than to checkout a fresh copy of the workspace, which is
essentially what my script amounts to.)

At one point I was on the verge of proposing SCons as a make
replacement, but balked when initial research into the prospect showed
that SCons consistently had performance issues with needing to scan the
entire source tree before it begins building.  That, coupled with
general resistance to change in your average programmer workforce and
their general unfamiliarity with make alternatives, made me back off
from making the proposal.

Had tup been around at that time, it would likely have turned the tables
with its killer combo of sublinear (relative to workspace size) scanning
and reliability.  That's why I think that any modern build system that's
going to last into the future must have these two features, at a
minimum.

> > I think it should be obvious that an approach whose complexity is
> > proportional to the size of the changeset is preferable to an
> > approach whose complexity is proportional to the size of the entire
> > source tree, esp.  given the large sizes of today's typical software
> > projects.  If I modify 1 file in a project of 10,000 source files,
> > rebuilding should not be orders of magnitude slower than if I modify
> > 1 file in a project of 100 files.
> 
> Is it obvious, but complexity is not everything, wall clock time is
> arguably more important.

Using inotify() to update your dependency tree has basically zero wall
clock time because it's done in the background. You can't beat that with
anything that requires scanning upon invocation of the build tool.

> As is actual build time versus preparation time. SCons does indeed
> have a large up-front ADG check time for large projects. I believe
> there is the Parts overlay on SCons for dealing with big projects. I
> believe the plan for later in the year is for the most useful parts of
> Parts to become part of the main SCons system. 

But still, the fundamental design limitation remains: scanning time is
proportional to workspace size, as opposed to being proportional to
changeset size.  Judging by current trends in software sizes, this issue
is only going to become increasingly important.

> > In this sense, while SCons is far superior to make in terms of
> > usability and reliability, its core algorithm is still inferior to
> > tools like tup.
> 
> However Tup is not getting traction compared to CMake (and either Make
> of preferably Ninja backend – I wonder if there is a Tup backend).

I mentioned Tup as an example of a superior build algorithm to the
decades-old make model. I'm not partial to Tup itself, and it doesn't
concern me whether or not it's gaining traction.  What I'm more
concerned with is whether the underlying algorithm of (insert whatever
build system you prefer here) is going to remain relevant going forward.

> > Now, I've not actually used tup myself other than a cursory glance
> > at how it works, so there may be other areas in which it's inferior
> > to SCons.  But the important thing is that it gets us away from the
> > O(N) of traditional build systems that requires scanning the entire
> > source tree, to the O(n) that's proportional to the size of the
> > changeset. The former approach is clearly not scalable. We ought to
> > be able to update the dependency graph in proportion to how many
> > nodes have changed; it should not require rebuilding the entire
> > graph every time you invoke the build.
> 
> I am not using Tup much simply because I have not started using it
> much, I just use SCons, Meson, and when I have to CMake/Ninja. In the
> end my projects are just not big enough for me to investigate the
> faster build times Tup reputedly brings.

Given its simplicity and lack of historical baggage, I'm expecting Tup
will be pretty fast, if not on par, with existing make-based designs,
when it comes to small to medium projects.  But for large projects of
today's scale, I'm expecting Tup is going to outstrip its competitors by
orders of magnitude, maybe more, *while still maintaining build
reliability*. (It *may* be possible to beat Tup in speed if you
sacrifice reliability, but I'm not considering that option as viable.)
Tup is getting pretty close to doing the absolute minimum work you need
to do in order for a code change to be reflected in the build products.
Any less than that, and you start risking unreliable builds (i.e.
outdated build products are not rebuilt).

[...]
> > Preferably, checking dependencies ought not to be done at all unless
> > the developer calls for it. Network access is slow, and I find it
> > intolerable when it's not even necessary in the first place.  Why
> > should it need to access the network just because I changed 1 line
> > of code and need to rebuild?
> 
> This was the reason for Waf, split the SCons system into a
> configuration set up and build à la Autotools. CMake also does this.
> As does Meson. I have a preference for this way. And yet I still use
> SCons quite a lot!

IMO, if a build system relies on network access as part of its
dependency graph, then something has gone horribly wrong. (Aside from
NFS and the like, of course.)  Updating libraries is IMO not the build
system's job; that's what a package manager is supposed to be doing.
The build system should be concerned solely with producing build
products, given the current state of the source tree.  It has no
business going about *updating* the source tree from the network
willy-nilly just because it can.  That's simply an unworkable model -- I
could be in the middle of debugging something, and then I rebuild and
suddenly the bug can no longer be reproduced because the build tool has
"helpfully" replaced one of my libraries with a new version and now the
location of the bug has shifted, putting my hours' worth of work in
narrowing down the locus of the bug to waste.

[...]
> > The documentation does not help in this respect. The only thing I
> > could find was a scanty description of how to invoke dub in its most
> > basic forms, with little or no information (or hard-to-find
> > information) on how to configure it more precisely.  Also, why
> > should I need to hardcode a specific version of a dependent library
> > just to suppress network access when rebuilding?! Sometimes I *do*
> > want to have the latest libraries pulled in -- *when* I ask for it
> > -- just not every single time I build.
> 
> If Dub really is to become the system for D as Cargo is for Rust, it
> clearly needs more people to work on it and evolve the code and the
> documentation. Whilst no-one does stuff, the result will be rhetorical
> ranting on the email lists.

The problem is that I have fundamental disagreements with dub's design,
and therefore find it difficult to bring myself to work on its code,
since my first inclination would be to rip its guts out and rewrite from
scratch, which I don't think Sönke will take kindly to, much less merge
into the official repo.  I suppose if I were pressed I could bring
myself to contribute to its documentation, but right now, I've switched
back to SCons for my builds and basically confined dub to a dummy empty
project that fetches and builds my dependent libraries and nothing else.
This setup works well for me, so I don't really have much motivation to
improve dub's docs or otherwise improve dub -- I won't be using it very
much after all.

[...]
> > AFAIK, the only standard that Dub is, is a packaging system for D.
> > I find it quite weak as a build tool.  That's the problem, it tries
> > to do too much.  It would have been nice if it stuck to just dealing
> > with packaging, rather than trying to do builds too, and doing it
> > IMO rather poorly.
> 
> No argument from me there, except Cargo. Cargo does a surprisingly
> good job of being a package management and build system. Even the go
> command is quite good at it for Go. So I am re-assessing my old
> dislike of this way – I used to be a "separate package management and
> build, and leave build to build systems" person, I guess I still am
> really. However Cargo is challenging my view, where Dub currently does
> not.
[...]

Then perhaps you should submit PRs to dub to make it more Cargo-like.
;-)

[...]
> > Honestly, I don't care to have a "standard" build system for D. A
> > library should be able to produce a .so or .a, and have an import
> > path, and I couldn't care less how that happens; the library could
> > be built by a hardcoded shell script for all I care. All I should
> > need to do in my code is to link to that .so or .a and specify -I
> > with the right import path(s). Why should upstream libraries dictate
> > how my code is built?!
> 
> This last point is one of the biggest problems with the current Dub
> system, and a reason many people have no intention of using Dub for
> build.
> 
> Your earlier points in this paragraph should be turned into issues on
> the Dub source repository, and indeed the last one as well. And then
> we should create pull requests.

Good idea.  Though I can't see this changing without rather intrusive
changes to the way dub works, so I'm not sure if Sönke would be open to
this sort of change.  But submitting issues to that effect wouldn't
hurt.

> I actually think a standard way is a good thing, but that there should
> be other ones as well. SCons, CMake, Meson, etc. all need ways of
> building D for those who do not want to use the standard way. Seems
> reasonable to me. However SCons and Meson support for D is not yet as
> good as it would be to have it, and last tine I tried CMake-D didn't
> work for me.

A standard way to build would be fine if we were starting out from
scratch, in a brand new ecosystem, like Rust.  The problem is, D has
always supported C/C++-style builds since day 1, and D codebases have
been around for far longer than dub has been, and have become entrenched
in the way they are built.  So for dub (or any other packaging / build
system, really) to come along and be gratuitously incompatible with how
existing build systems work, is a big showstopper, and gives off the
impression of being a walled garden -- either you embrace it fully to
the exclusion of all else, or you're left out in the cold.

> > To this end, a standard way of exporting import paths in a D library
> > (it can be as simple as a text file in the code repo, or some script
> > or tool akin to llvm-config or sdl-config that spits out a list of
> > paths / libraries / etc) would go much further than trying to
> > shoehorn everything into a single build system.
> 
> So let's do it rather than just talk about it?
[...]

Sure.  Since this information is ostensibly already present in a dub
project (encoded somewhere in dub.json or dub.sdl), it seems to make
little sense to introduce yet another new thing that nobody implements.
So a first step might be to enhance dub with a command-line command to
output import paths / linker paths in a machine-readable format.  Then
existing dub projects can be immediately made accessible to external
build systems by having said build systems invoke:

	dub config	# or whatever verb is chosen for this purpose

Perhaps, to eliminate the need for existing build scripts to need to
parse JSON or something like that, we could provide finer-grained
subcommands, like:

	dub config import-paths
	dub config linker-paths
	dub config dynamic-library-paths

and it would output, respectively, something along the lines of:

	/path/to/somelibrary/src
	/path/to/someotherlib/src
	/path/to/yetanotherlib/submodule1/import
	/path/to/yetanotherlib/submodule2/import

	/path/to/somelibrary/generated/os/64/lib
	/path/to/someotherlib/generated/lib
	/path/to/yetanotherlib/generated/sub/module1/out
	/path/to/yetanotherlib/generated/sub/module2/out

	-lsomelibrary
	-lsomeotherlib
	-lyetanotherlib

Not 100% sure what to do with existing non-dub projects. Perhaps a text
file in some standard location.

T

-- 
In theory, software is implemented according to the design that has been
carefully worked out beforehand. In practice, design documents are
written after the fact to describe the sorry mess that has gone on
before.