Having a bit if fun on stackoverflow

Wed Jun 26 21:13:43 PDT 2013

On Thu, Jun 27, 2013 at 03:00:05AM +0200, Idan Arye wrote:
> On Wednesday, 26 June 2013 at 23:40:52 UTC, H. S. Teoh wrote:
> 
> >Stop right there. As soon as "manual" enters the picture, you no
> >longer have a build process. You may have a *caricature* of a build
> >process, but it's no build process at all. I don't care if it's
> >hitting F5 or running make, if I cannot (check out the code from
> >version control / download and unpack the source tarball) and
> >*automatically* recreate the entire distribution binary by running a
> >(script / makefile / whatever), then it's not a build process.
> 
> Whether it needs to be automated to be called a build process is a
> matter of definitions. The important thing is to agree that it's bad.

No, what I meant was that going from clean source (i.e., only fresh
source file, no auto-generated files, no intermediate files, no cached
object files, clean, pristine source) to the fully-built binary should
be possible *without* manually typing any commands other than invoking
the build script / IDE build function / whatever.

IOW, builds must be reproducible. They should not rely on arbitrary
undocumented commands that the original author typed at arbitrary points
in time, that produced intermediate files that are required later. Every
single command necessary to start from pristine, unprocessed source code
to fully-functional binary must be encapsulated in the build script /
build command / whatever you call it, so that, in principle, pushing a
single button will produce the final, releasable binary. You should be
able to ship the pristine, unprocessed source code to somebody and they
should be able to get a binary out of it by "pushing the same button",
so to speak.

[...]
> I don't consider having to write 50 commands each time you want to
> build the software that harmful - not because it's good, but because
> it's so bad that no developer will agree to live with it. And luckily
> for most developers - that's a problem they don't have to live with,
> because IDEs can handle it pretty well.
> 
> The real problem is with commands that you only have to type now and
> then.

That's what I mean. When you have commands that you "only have to type
now and then", they MUST be part of the automated build process, be it
your build script, IDE project file, or whatever it is you use to build
your program. Otherwise, it's not possible for you to just ship the
pristine source code to somebody else and have them able to build it
just by hitting the "build" button.

> For example, let's assume you have a .lex file somewhere in your
> project. Visual Studio does not no how to handle it(I think - it has
> been years since I last touched VS, and I didn't do any advanced stuff
> with it). But VS knows pretty well how to handle everything else, and
> you don't want to start learning a build system just for that single
> .lex file - after all, it's just one command, and you don't really
> need to do it every time - after all, you rarely touch it, and the
> auto-generated .yy.c file stays in the file system for the next build.

That's the formula for disaster.

Consider:
1) You write some code;
2) You decide you need flex, but VS doesn't support calling flex, so you
   run it by hand;
3) Programmer B wants to try out your code, so you ship him the source
   files. It fails miserably 'cos he doesn't have flex installed.
4) Solution? Just include the .yy.c the next time you send him the code.
   Now it compiles. Everything's OK now, right? Wrong.
5) You make some changes to the code, but forget to rerun flex. Now the
   .yy.c is out of sync with the .lex, but it just happens to still
   compile, so you ship the new code to programmer B.
6) Programmer B compiles everything and ships the product to the
   customer.
7) In the meantime, you suddenly remember you didn't re-run flex, so you
   do that and recompile everything.
8) The customer comes back and complains there are bugs in the code. You
   can't reproduce it, 'cos your .yy.c is up-to-date now.
9) Another customer complains that the previous release of the code has
   a critical bug. You check out the old code from version control, but
   .yy.c wasn't in version control, so the old code doesn't even
   compile.
10) After hours of hair-pulling, the old code finally compiles. Of
   course, you've done all sorts of things to try to make it compile,
   but the dynamic libraries are not the same, the new version of VS
   has a different default setting, etc., so of course, you can't
   reproduce the customer's problem.
11) You give up, and check out the new code to continue working on
   something else. But the .yy.c is again out-of-sync with the .lex 'cos
   you touched it while trying to make the old version compile. The code
   compiles, but has subtle bugs caused by the out-of-sync file.
12) After you finally remember to run flex again, programmer B checks
   out the code, and now his build fails, 'cos the .yy.c is out of sync
   and causes a compile error.
13) You decide that since the .yy.c keeps causing problems, you should
   check it into the VCS.  Now everything works fine. Or does it?
14) Programmer B checks out the code, and modifies the .lex, but doesn't
   re-run flex. He checks in the changes. You check out the changes, and
   now your code doesn't work anymore, 'cos the .yy.c is out of date.

See how this is a vicious cycle of endless frustration and wasted time?

The correct way of doing things is to include EVERYTHING you need to go
from raw source files to final binary in a single build script / project
file / whatever. You have to guarantee that, given the pristine source
code (i.e. without any externally-generated products), a single button
(or script, or makefile, etc.) will be able to regenerate the binaries
you shipped. This has to work for EVERY RELEASED VERSION of your
program. You should be able to check out any prior version of your code,
and be assured that after you hit the "compile" button, the executable
you get at the end is IDENTICAL to the executable you shipped to the
customer 12 months ago.

Anything else is just the formula for endless frustration, untraceable
bugs, and project failure. If your IDE's build function doesn't support
full end-to-end reproducible builds, it's worthless and should be
thrown out.

> So, you use the shell to call `flex`, and then compile your project
> with VS, and continue coding happily without thinking about that .lex
> file.
> 
> A few weeks pass, and you have to change something in the .lex file.
> So you change it, and compile the code, and run the project, and
> nothing changes - because you forgot to call `flex`. So you check
> your code - but everything seems OK. So you do a rebuild - because
> that usually helps in such situations - and VS deletes the .exe and
> all the .obj files, but it doesn't delete the .yy.c file - because
> it's a C source file, so VS assumes it's part of the source code -
> and then VS compiles everything from scratch - and again nothing
> changes!
> 
> So, you do what any sane programmer would do - you throw the
> computer out of the window.

Or rather, you throw the IDE out the window, 'cos its build function is
defective. :-P

> When your new computer arrives, you check out the code from the
> repository and try to compile, and this time you get a compile error -
> because you don't have the .yy.c file. Now you finally understand that
> you forgot to call `flex`!

This is a sign of a defective IDE build function.

> Well, you learned from your mistake so you won't repeat it again, so
> you say to yourself that there is still no point in introducing a
> build system just to handle a single .lex file...
> 
> That's why I'm not worried about problems that you can't live with.
> If people can't live with a problem - they will find and implement a
> solution. It's the problems you *can* live with that make me worry -
> because there will always be people who prefer to live with the
> problem than to be bothered with the solution...

Then they only have themselves to blame when they face an endless stream
of build problems, heisenbugs that appear/disappear depending on what
extra commands you type at the command prompt, inability to track down
customer reported bugs in old versions, and all sorts of neat and handy
things like that.

> >If you don't have a build system, your project is already doomed.
> >Nevermind auto-generated files, external libraries, or SCMs, those
> >are just nails in the coffin.  Any project that spans more than a
> >single source file (and I don't mean just code -- that includes data,
> >autogenerated files, whatever inputs are required to create the final
> >product) *needs* a build system.
> 
> With that I don't agree - simple projects that only have source
> files can get away with IDE building, even if they have multiple
> source files. I'm talking about zero configuration projects - no
> auto-generated files, no third party libraries - all you have to do
> is create a default project in the IDE, import all the source files,
> and hit F5(or the equivalent shortcut). The moment you have to
> change a single compiler switch - you need a build system.

I'd argue that you need a build system from the get-go. Ideally, the
IDE's project file SHOULD support such things as building external
products. If it doesn't, it's essentially worthless and you should use a
real build system instead.

But even if this is supported, there's still the problem of compile
switches inserted by the IDE that you may not know about. Consider if
the IDE has a configuration window where you can select compile
switches. You twiddle with some of those settings and later forget about
them completely. Then you ship your files to developer B, and he hits
the build button and gets a different executable, 'cos his IDE settings
don't match yours.

This is just the same sad story rehashed. For any serious software
project, reproducible builds is a must. There's simply no way around it.
Shipping executables that depend on arbitrary IDE settings that vary
depending on which developer did it, is a very bad business model.
Shipping executables that you cannot reproduce by checking out a
previous version of the code from the VCS is a very bad business model.
Even *developing* a software for which you can't make reproducible
executables is a bad business model -- it hurts programmer productivity.
Countless hours are wasted trying to track down bugs and other strange
problems that ultimately come from non-reproducible builds. It also
hurts morale: nobody dares check out the latest code from the VCS 'cos
it has a reputation of introducing random build failures, which wastes
time (have to make clean; make, every single time, and if you're dealing
with C/C++ where the build times are measured in hours, that just kills
productivity instantly). As a result, you get endless merge conflicts
when everybody tries to check in their code which has been out-of-sync
for weeks, and everybody blames each other for the conflicts ("argh why
did you touch this file in *my* subdirectory?!").

Not having 100% reproducible builds is simply not workable.

> I myself always use a build system, because I use Vim so I don't have
> IDE building. The only exception is single-source files of interpreted
> languages, where I can use shebangs.

Single-source files are OK without a build system, though sometimes I
still do it, just so I get the compile flags right. For shebangs, it's a
different story 'cos you can just put the compile flags into the
shebang line.

But anything beyond that, requires a *reproducible* build system (even
if it's the IDE's build command). Otherwise you're just setting yourself
up for needless frustration and failure.

T

-- 
Designer clothes: how to cover less by paying more.