A brief survey of build tools, focused on D
H. S. Teoh
hsteoh at quickfur.ath.cx
Mon Dec 10 21:01:08 UTC 2018
On Mon, Dec 10, 2018 at 06:27:48PM +0000, Neia Neutuladh via Digitalmars-d-announce wrote:
> I wrote a post about language-agnostic (or, more accurately, cross-
> language) build tools, primarily using D as an example and Dub as a
> benchmark.
>
> Spoiler: dub wins in speed, simplicity, dependency management, and
> actually working without modifying the tool's source code.
>
> https://blog.ikeran.org/?p=339
Wow. Thanks for the writeup that convinces me that I don't need to
waste time looking at Meson/Ninja.
I find the current landscape of build systems pretty dismal. Dub may be
simple to use, but speed, seriously?! If *that's* the generally accepted
standard of build speed out there these days, then hope is slim.
Convenience and simplicity, sure. But speed? I'm sorry to say, I tried
dub for 2 days and gave up in frustration because it was making my
builds *several times longer* than a custom SCons script. I find that
completely unacceptable.
It also requires network access. On *every* invocation, unless
explicitly turned off. And even then, it performs time-consuming
dependency resolutions on every invocation, which doubles or triples
incremental build times. Again, unacceptable.
Then it requires a specific source layout, with incomplete /
non-existent configuration options for alternatives. Which makes it
unusable for existing code bases. Unacceptable.
Worst of all, it does not support custom build actions, which is a
requirement for many of my projects. It does not support polyglot
projects. It either does not support explicit control over exact build
commands, or any such support is so poorly documented it might as well
not exist. This is not only unacceptable, it is a show-stopper.
This leaves only package management as the only thing about dub that I
could even remotely recommend (and even that is too unconfigurable for
my tastes -- basically, it's a matter of "take my way or the highway" --
but I'll give it credit for at least being *usable*, if not very
pleasant). But given its limitations, it means many of my projects
*cannot* ever be dub projects, because they require multiple language
support and/or code generation rules that are not expressible as a dub
build. Which means the package management feature is mostly useless as
far as my projects are concerned -- if I ever have a dependency that
requires code generation and/or multiple languages, dub is out of the
question. So I'm back to square one as far as dependency management and
build system are concerned.
This dismal state of affairs means that if my code ever depends on a dub
package (I do have a vibe.d project that does), I have to use dub as a
secondary tool -- and even here dub is so inflexible that I could not
make coax it work nicely with the rest of my build system. In my vibe.d
project I had to resort to creating a dummy empty project in a
subdirectory, whose sole purpose is to declare dependency on vibe.d so
that I can run dub to download and build vibe.d (and generate a dummy
executable that does nothing). Then I have to manually link in the
vibe.d build products in my real build system as a separate step.
//
Taking a step back, this state of affairs is completely ridiculous. The
various build systems out there are gratuitously incompatible with each
other, and having dependencies that cross build system boundaries is
completely unthinkable, even though at its core, it's exactly the same
miserable old directed acyclic graph, solved by the same old standard
graph algorithms. Why shouldn't we be able to integrate subgraphs of
different origins into a single, unified dependency graph, with standard
solutions by standard graph algorithms? Why should build systems be
effectively walled gardens, with artificial barriers that prevent you
from importing a Gradle dependency into a dub project, and importing
*that* into an SCons project, for example?
After so many decades of "advancement", we're still stuck in the
gratuitously incompatible walled gardens, like the gratuitous browser
incompatibilities of the pre-W3C days of the Web. And on modern CPUs
with GHz clock speeds, RAM measured in GBs, and gigabit download speeds,
building Hello World with a system like dub (or Gradle, for that matter)
is still just as slow (if not slower!) as running make back in the 90's
on a 4 *kHz* processor. It's ridiculous.
Why can't modern source code come equipped with dependency information
in a *standard format* that can be understood by *any* build system?
Build systems shouldn't need to reinvent their own gratuitously
incompatible DSL just to express what's fundamentally the same old
decades-worn directed graph. And programmers shouldn't need to repeat
themselves by manually enumerating individual graph edges (like Meson
apparently does). It should be the compilers that generate this
information -- RELIABLY -- in a standard format that can be processed by
any tool that understands the common format. You should be able to
download a source package from *any* repository, and be able to build it
by reading only the build description in standardized, automatable
format. Any project should be able to depend on any other project
regardless of their respective build systems.
(And don't get me started on reliable, reproducible builds -- which with
the majority of today's build systems either require a fresh checkout,
or an excruciatingly long rescanning of entire source trees, just to do
it right. 100% reproducible incremental builds should be a hard
requirement in this day and age, as should incremental build times that
are proportional to the changeset size, rather than source tree size.
And almost no build system handles reliable builds correctly when the
build description is changed -- Button does, but it's in the extreme
minority, and is still a pretty young project that's not widely known).
My vision of the ideal build system:
- Reproducible: no matter what state your workspace is in (fresh
checkout, long-time workspace possibly with temporary / outdated /
partial build products), the build should always produce the same
result relative to the current state of the source code. There should
be no heisenbugs caused by linking in stale versions of libraries /
object files. The build products depend only on the current state of
the source code; the presence / absence of stale / partial build
products should have no effect on the final build products. Where
this point contradicts any of the other points below (esp. Efficient),
it trumps them all.
- Resilient: if the build description changes, the build tool should
automatically remove stale (possibly intermediate) build products. It
should not leave outdated build products (like .o files of .d files
that have since been removed from the source tree) lying around. This
is essentially the same as Reproducible, but I list it explicitly
because many build systems that are otherwise Reproducible fail this
point.
- Efficient: the amount of work done by the build should be proportional
to the size of changes made to the source code since the last build,
NOT proportional to the size of the entire source tree (SCons fails in
this regard). In particular, there should be no need to 'make clean;
make' just to fulfill the Reproducible requirement. And the build tool
should not have to scan the entire source tree every single time just
to be able to build the dependency graph. (It may do this the first
time it is run on a workspace in order to build the initial DAG. But
thereafter it should be able to incrementally update the DAG without
needing to scan the entire workspace again.)
- Incremental: if I ask for build project X, the build system should
perform only the minimal actions required to build X. It should skip
over any actions that are needed only by build products I didn't ask
for. In particular, if I'm currently working on a submodule of the
project, I should be able to tell the build system to only build that
submodule and nothing more, even if some of the code changes may
affect other modules (and require those modules to be rebuilt). Almost
the same as Efficient but more specific (Efficient can be satisfied by
building *all* build products in a minimal way; this point requires
the ability to build only one build product among many).
- Parallel: the build system ought to be able to take advantage of
multicore / multithreaded CPUs and execute build actions in parallel,
when they are from parts of the dependency graph that don't depend on
each other. It must be able to do this by default, and require no
programmer intervention to support, and it must do so in a correct way
-- there must be no race conditions in build actions, and the final
build products must be identical to that produced by a serial build.
- Language-agnostic: the build system should be essentially a dependency
graph resolver. It should be able to compile (possibly via plugins)
source code of any language using any given compiler, provided such a
combination is at all possible. In fact, at its core, it shouldn't
even have the concept of "compilation" at all; it should be able to
generate, e.g., .png files from POVRay scene description files, run
image post-processing tools on them, then package them into a tarball
and upload it to a remote webserver -- all driven by the same
underlying DAG. Of course, it can have plugins of various sorts that
can allow it to infer how to do this without needing the programmer to
spell out explicit shell commands. And it may come with a set of
default plugins for some default-chosen set of language(s) / build
actions. But it should not be restricted merely to this set.
- Automated: as much as possible, the programmer shouldn't have to spell
out individual build rules. Plugins should be able to infer build
rules based on file types / simplified parsing, from imported build
descriptions, or from learning from previous build actions. Plugins
should be able to scan directories to identify input source files and
automatically infer build rules for them, as much as is possible.
- Extensible: the programmer ought to be able to easily create plugins
or new build actions when necessary. I.e., it should be possible to
explicitly specify the edge labels on the dependency graph when the
build system cannot automatically infer it. Basically the same as
Language-agnostic, but specifically requires that the set of plugins
must be open, not closed, and creating such a plugin should not
require onerous effort. It should also be possible to override /
configure the behaviour of existing plugins (e.g., change compile
flags, etc.).
- Transitive: if a current build action converts files of type Y to
products of type Z, it should be possible to introduce a new action
that produces files of type Y from files of type X, and the build
system should be able to correctly derive files of type Z from files
of type X. In particular, any dependencies between files of types X,
Y, and Z must be handled correctly. A possible failure case is if a
plugin scans a directory for files of type Y, but fails to pickup one
or more files of type Y that are generated as part of the build from a
file of type X (because that file hasn't been generated yet when the
plugin scans the directory). This places constraints on permissible
plugin architectures.
- Universal: if the build system is able to successfully produce the
build product(s), it should be able to emit a standardized
representation of the dependency graph (including the exact executed
commands) describing exactly how it arrived at this state.
Furthermore, it should be able to take such a description as input and
be able to recreate the exact same build from a copy of the source
tree (in any state -- including possibly partial build products or
unrelated files), regardless of whether the description was generated
by this build system specifically.
Reproducible builds have been possible since the days of Cons / SCons
(among others). Sadly, some contemporary build tools still fail this
point.
Resilient builds are definitely possible -- Button does it.
Efficient builds have been proven possible by tup and its derivatives.
Incremental builds (of the Reproducible sort -- make fails this point)
have also been available since Cons / SCons and others from that era,
and so have Parallel builds.
Language-agnostic seems to be frequently neglected, but seems to be
generally well-supported in newer build tools. Extensible also.
Automated builds have been around for a while, mostly with
language-specific systems like Gradle or dub. But language-agnostic
build tools that also support (partial) automatic builds are rare --
SCons somewhat fills this ticket, but it leaves much to be desired.
Transitive builds primarily concern build system plugins and automated
dependency generation. I'm not sure if any build systems fully support
it (SCons ostensibly does, but recently I discovered that its Java
builder fails this point because it does not detect .java files that are
generated by another build rule, and consequently causes the build to
sometimes fail intermittently or violate Reproducible).
I currently know of no build tools that are Universal. But there's no
technical reason why it cannot be achieved. The technology already
exists since a long time ago.
T
--
Today's society is one of specialization: as you grow, you learn more and more about less and less. Eventually, you know everything about nothing.
More information about the Digitalmars-d-announce
mailing list