A brief survey of build tools, focused on D

H. S. Teoh hsteoh at quickfur.ath.cx
Mon Dec 10 21:01:08 UTC 2018


On Mon, Dec 10, 2018 at 06:27:48PM +0000, Neia Neutuladh via Digitalmars-d-announce wrote:
> I wrote a post about language-agnostic (or, more accurately, cross-
> language) build tools, primarily using D as an example and Dub as a
> benchmark.
> 
> Spoiler: dub wins in speed, simplicity, dependency management, and
> actually working without modifying the tool's source code.
> 
> https://blog.ikeran.org/?p=339

Wow.  Thanks for the writeup that convinces me that I don't need to
waste time looking at Meson/Ninja.

I find the current landscape of build systems pretty dismal. Dub may be
simple to use, but speed, seriously?! If *that's* the generally accepted
standard of build speed out there these days, then hope is slim.

Convenience and simplicity, sure.  But speed? I'm sorry to say, I tried
dub for 2 days and gave up in frustration because it was making my
builds *several times longer* than a custom SCons script.  I find that
completely unacceptable.

It also requires network access.  On *every* invocation, unless
explicitly turned off.  And even then, it performs time-consuming
dependency resolutions on every invocation, which doubles or triples
incremental build times.  Again, unacceptable.

Then it requires a specific source layout, with incomplete /
non-existent configuration options for alternatives.  Which makes it
unusable for existing code bases.  Unacceptable.

Worst of all, it does not support custom build actions, which is a
requirement for many of my projects.  It does not support polyglot
projects. It either does not support explicit control over exact build
commands, or any such support is so poorly documented it might as well
not exist.  This is not only unacceptable, it is a show-stopper.

This leaves only package management as the only thing about dub that I
could even remotely recommend (and even that is too unconfigurable for
my tastes -- basically, it's a matter of "take my way or the highway" --
but I'll give it credit for at least being *usable*, if not very
pleasant).  But given its limitations, it means many of my projects
*cannot* ever be dub projects, because they require multiple language
support and/or code generation rules that are not expressible as a dub
build.  Which means the package management feature is mostly useless as
far as my projects are concerned -- if I ever have a dependency that
requires code generation and/or multiple languages, dub is out of the
question.  So I'm back to square one as far as dependency management and
build system are concerned.

This dismal state of affairs means that if my code ever depends on a dub
package (I do have a vibe.d project that does), I have to use dub as a
secondary tool -- and even here dub is so inflexible that I could not
make coax it work nicely with the rest of my build system.  In my vibe.d
project I had to resort to creating a dummy empty project in a
subdirectory, whose sole purpose is to declare dependency on vibe.d so
that I can run dub to download and build vibe.d (and generate a dummy
executable that does nothing). Then I have to manually link in the
vibe.d build products in my real build system as a separate step.

//

Taking a step back, this state of affairs is completely ridiculous. The
various build systems out there are gratuitously incompatible with each
other, and having dependencies that cross build system boundaries is
completely unthinkable, even though at its core, it's exactly the same
miserable old directed acyclic graph, solved by the same old standard
graph algorithms.  Why shouldn't we be able to integrate subgraphs of
different origins into a single, unified dependency graph, with standard
solutions by standard graph algorithms?  Why should build systems be
effectively walled gardens, with artificial barriers that prevent you
from importing a Gradle dependency into a dub project, and importing
*that* into an SCons project, for example?

After so many decades of "advancement", we're still stuck in the
gratuitously incompatible walled gardens, like the gratuitous browser
incompatibilities of the pre-W3C days of the Web. And on modern CPUs
with GHz clock speeds, RAM measured in GBs, and gigabit download speeds,
building Hello World with a system like dub (or Gradle, for that matter)
is still just as slow (if not slower!) as running make back in the 90's
on a 4 *kHz* processor.  It's ridiculous.

Why can't modern source code come equipped with dependency information
in a *standard format* that can be understood by *any* build system?
Build systems shouldn't need to reinvent their own gratuitously
incompatible DSL just to express what's fundamentally the same old
decades-worn directed graph. And programmers shouldn't need to repeat
themselves by manually enumerating individual graph edges (like Meson
apparently does). It should be the compilers that generate this
information -- RELIABLY -- in a standard format that can be processed by
any tool that understands the common format.  You should be able to
download a source package from *any* repository, and be able to build it
by reading only the build description in standardized, automatable
format.  Any project should be able to depend on any other project
regardless of their respective build systems.

(And don't get me started on reliable, reproducible builds -- which with
the majority of today's build systems either require a fresh checkout,
or an excruciatingly long rescanning of entire source trees, just to do
it right. 100% reproducible incremental builds should be a hard
requirement in this day and age, as should incremental build times that
are proportional to the changeset size, rather than source tree size.
And almost no build system handles reliable builds correctly when the
build description is changed -- Button does, but it's in the extreme
minority, and is still a pretty young project that's not widely known).

My vision of the ideal build system:

- Reproducible: no matter what state your workspace is in (fresh
  checkout, long-time workspace possibly with temporary / outdated /
  partial build products), the build should always produce the same
  result relative to the current state of the source code. There should
  be no heisenbugs caused by linking in stale versions of libraries /
  object files.  The build products depend only on the current state of
  the source code; the presence / absence of stale / partial build
  products should have no effect on the final build products.  Where
  this point contradicts any of the other points below (esp. Efficient),
  it trumps them all.

- Resilient: if the build description changes, the build tool should
  automatically remove stale (possibly intermediate) build products. It
  should not leave outdated build products (like .o files of .d files
  that have since been removed from the source tree) lying around.  This
  is essentially the same as Reproducible, but I list it explicitly
  because many build systems that are otherwise Reproducible fail this
  point.

- Efficient: the amount of work done by the build should be proportional
  to the size of changes made to the source code since the last build,
  NOT proportional to the size of the entire source tree (SCons fails in
  this regard). In particular, there should be no need to 'make clean;
  make' just to fulfill the Reproducible requirement. And the build tool
  should not have to scan the entire source tree every single time just
  to be able to build the dependency graph. (It may do this the first
  time it is run on a workspace in order to build the initial DAG. But
  thereafter it should be able to incrementally update the DAG without
  needing to scan the entire workspace again.)

- Incremental: if I ask for build project X, the build system should
  perform only the minimal actions required to build X. It should skip
  over any actions that are needed only by build products I didn't ask
  for.  In particular, if I'm currently working on a submodule of the
  project, I should be able to tell the build system to only build that
  submodule and nothing more, even if some of the code changes may
  affect other modules (and require those modules to be rebuilt). Almost
  the same as Efficient but more specific (Efficient can be satisfied by
  building *all* build products in a minimal way; this point requires
  the ability to build only one build product among many).

- Parallel: the build system ought to be able to take advantage of
  multicore / multithreaded CPUs and execute build actions in parallel,
  when they are from parts of the dependency graph that don't depend on
  each other.  It must be able to do this by default, and require no
  programmer intervention to support, and it must do so in a correct way
  -- there must be no race conditions in build actions, and the final
  build products must be identical to that produced by a serial build.

- Language-agnostic: the build system should be essentially a dependency
  graph resolver. It should be able to compile (possibly via plugins)
  source code of any language using any given compiler, provided such a
  combination is at all possible. In fact, at its core, it shouldn't
  even have the concept of "compilation" at all; it should be able to
  generate, e.g., .png files from POVRay scene description files, run
  image post-processing tools on them, then package them into a tarball
  and upload it to a remote webserver -- all driven by the same
  underlying DAG. Of course, it can have plugins of various sorts that
  can allow it to infer how to do this without needing the programmer to
  spell out explicit shell commands. And it may come with a set of
  default plugins for some default-chosen set of language(s) / build
  actions. But it should not be restricted merely to this set.

- Automated: as much as possible, the programmer shouldn't have to spell
  out individual build rules.  Plugins should be able to infer build
  rules based on file types / simplified parsing, from imported build
  descriptions, or from learning from previous build actions. Plugins
  should be able to scan directories to identify input source files and
  automatically infer build rules for them, as much as is possible.

- Extensible: the programmer ought to be able to easily create plugins
  or new build actions when necessary. I.e., it should be possible to
  explicitly specify the edge labels on the dependency graph when the
  build system cannot automatically infer it. Basically the same as
  Language-agnostic, but specifically requires that the set of plugins
  must be open, not closed, and creating such a plugin should not
  require onerous effort.  It should also be possible to override /
  configure the behaviour of existing plugins (e.g., change compile
  flags, etc.).

- Transitive: if a current build action converts files of type Y to
  products of type Z, it should be possible to introduce a new action
  that produces files of type Y from files of type X, and the build
  system should be able to correctly derive files of type Z from files
  of type X.  In particular, any dependencies between files of types X,
  Y, and Z must be handled correctly.  A possible failure case is if a
  plugin scans a directory for files of type Y, but fails to pickup one
  or more files of type Y that are generated as part of the build from a
  file of type X (because that file hasn't been generated yet when the
  plugin scans the directory). This places constraints on permissible
  plugin architectures.

- Universal: if the build system is able to successfully produce the
  build product(s), it should be able to emit a standardized
  representation of the dependency graph (including the exact executed
  commands) describing exactly how it arrived at this state.
  Furthermore, it should be able to take such a description as input and
  be able to recreate the exact same build from a copy of the source
  tree (in any state -- including possibly partial build products or
  unrelated files), regardless of whether the description was generated
  by this build system specifically.


Reproducible builds have been possible since the days of Cons / SCons
(among others). Sadly, some contemporary build tools still fail this
point.

Resilient builds are definitely possible -- Button does it.

Efficient builds have been proven possible by tup and its derivatives.

Incremental builds (of the Reproducible sort -- make fails this point)
have also been available since Cons / SCons and others from that era,
and so have Parallel builds.

Language-agnostic seems to be frequently neglected, but seems to be
generally well-supported in newer build tools. Extensible also.

Automated builds have been around for a while, mostly with
language-specific systems like Gradle or dub. But language-agnostic
build tools that also support (partial) automatic builds are rare --
SCons somewhat fills this ticket, but it leaves much to be desired.

Transitive builds primarily concern build system plugins and automated
dependency generation.  I'm not sure if any build systems fully support
it (SCons ostensibly does, but recently I discovered that its Java
builder fails this point because it does not detect .java files that are
generated by another build rule, and consequently causes the build to
sometimes fail intermittently or violate Reproducible).

I currently know of no build tools that are Universal.  But there's no
technical reason why it cannot be achieved.  The technology already
exists since a long time ago.


T

-- 
Today's society is one of specialization: as you grow, you learn more and more about less and less. Eventually, you know everything about nothing.


More information about the Digitalmars-d-announce mailing list