Having a bit if fun on stackoverflow

Wed Jun 26 14:26:29 PDT 2013

On Wed, Jun 26, 2013 at 10:23:21PM +0200, Idan Arye wrote:
> On Tuesday, 25 June 2013 at 12:17:24 UTC, H. S. Teoh wrote:
> >On Tue, Jun 25, 2013 at 08:55:04AM +0200, monarch_dodra wrote:
> >>On Tuesday, 25 June 2013 at 06:46:28 UTC, Jonathan M Davis
> >>wrote:
> >>>On Tuesday, June 25, 2013 08:38:01 Marco Leise wrote:
> >>>>Am Mon, 24 Jun 2013 08:45:26 -0700
> >>>>
> >>>>schrieb Andrei Alexandrescu <SeeWebsiteForEmail at erdani.org>:
> >>>>> http://stackoverflow.com/questions/17263604/i-have-a-c-repository-but-gith
> >>>>> ub-says-its-d
> >>>>> > Andrei
> >>>>
> >>>>This is why you don't put automatically generated files in version
> >>>>control ... Especially when they have the file ending used by an
> >>>>indexed PL on GitHub ;)
> >>>
> >>>Yeah. That was the great faux pas of that question. I'm not aware
> >>>of any good reason to put generated files in version control unless
> >>>they were only generated once and will never be generated again.
> >>>
> >>>- Jonathan M Davis
> >>
> >>Well, depends how you use the version control I guess. You *can* use
> >>it for more than just going back in time or concurrent edits: You
> >>can use it as a redistributable network folder.
> >>
> >>The company I work for does it that way. It means when you checkout
> >>a project, you don't have to run 10+ different tools to generate
> >>whatever it needs to generate: You are ready to roll. You save on
> >>time and headaches. Whenever someone changes the xml, you don't have
> >>to regenerate everything every time you resync. The overall time and
> >>overhead wasted by a few guys checking in their generated files is
> >>more than made up for everyone else not having to worry (or even
> >>know) about it. But to each their own of course, this works for
> >>_us_.
> >[...]
> >
> >This can backfire in ugly ways if not used carefully. At my work,
> >there are some auto-generated files (tool-generated source code) that
> >get checked into version control, which generally works fine... then
> >we got into a state where the makefile builds stuff that requires the
> >generated files before they're actually generated. When somebody then
> >modifies whatever is used to generate said files but forgets to check
> >in the new version of the generated files, you get into nasty
> >nigh-untraceable inconsistencies where part of the build picks up an
> >old version of said file but the rest of the build picks up the new
> >version.
[...]
> >In general, this practice is the source of a lot of needless grief,
> >so I've come to be of the opinion that it's a bad idea.
> >
> >
> >T
> 
> I guess that depends whether or not F5 is your build process
> (http://www.codinghorror.com/blog/2007/10/the-f5-key-is-not-a-build-process.html).

What's F5?

> If you rely on your IDE to compile and run your project, then you
> usually want to check in those auto-generated files - because when
> you generated them for your local copy, you had to use different
> tools, download some libraries, configure your IDE etc - and you
> want to save other people(or yourself on another computer) the
> trouble of doing it all again - not to mention to save yourself the
> trouble of documenting exactly what you did so others can follow.

We don't use IDEs where I work. Or at least, we frown on them. :-P We
like to force developers to actually think about build processes instead
of just hitting a key and assuming everything is OK. One big reason is
that we want builds to be reproducible, not dependent on strange IDE
settings some key developer happens to have that nobody else can
replicate.

So we use makefiles... which are a royal PITA, but at least they give
you a semblance of reproducibility (fresh version control checkout, run
./configure && make, and it produces a usable product at the end). I
have a lot of gripes about makefiles and would never use such broken
decrepit technology in my own projects, but they are nevertheless better
on the reproducibility front than some IDE "build process" that nobody
knows how to replicate after the key developer leaves the company.

> On the other hand, if you use a proper build system, you can - and
> should - configure your build file to auto-generate those files using
> external tools, and maybe even use a dependency manager to download
> those libraries.

We do all that. But when you have (more than) 50 people working on the
same source tree, the dynamics are rather different. In theory, it's a
single Makefile hierarchy, but in reality it's a hodgepodge of ugly
hacks and shoehorning of poorly-written Makefiles that only barely
manage to build successfully when you do a fresh checkout.

But that's not really relevant. Here's an illustration of the problem at
hand:
(1) The external tools are built from source in the source tree;
(2) They need to be built first, then run as part of the build process
    to produce the auto-generated files;
(3) Somebody unwisely decides to check in the generated file(s).
(4) Later on, some unknowing developer comes along and say, hey look!
    file xyz.h already exists, so let's use it in my new code! -- not
    realizing that xyz.h is auto-generated *later* on in the build
    process than the new code;
(5) Some changes are necessary to whatever data the external tools use
    to produce xyz.h, so now we have a new version of xyz.h. However,
    since our developers don't directly checkin stuff to version control
    (they submit patches to the reviewers), sometimes people forget to
    include the new xyz.h in the patch. So now the version of xyz.h in
    version control doesn't match the input data to the external tools.
(6) The release team updates their workspace, which pulls in the new
    data used to generate xyz.h, but xyz.h itself isn't updated. They
    run a build, and half the source tree is compiled with the wrong
    version of xyz.h, and the other half with the new version (when
    later on in the makefile xyz.h is regenerated from the new data).
(7) The build is released to the customer, who reports strange runtime
    errors with inscrutible stack traces (due to ABI mismatch).
(8) The devs can't reproduce the problem, 'cos by the time it's
    reported, they've built their workspace several hundred times, and
    the old xyz.h is long gone.  They stare at the code until their gaze
    bores two holes through their monitor, and they still can't locate
    the problem.

> Not only does the build system's ability to easily generate those
> auto-generated files make checking them in redundant - it also makes
> it more troublesome. If you had to manually configure and invoke a
> tool to generate a file, chances are you'll only do that again when
> you really have to, but if the build system does that for you -
> usually as a part of a bigger task - that file will be updated
> automatically by many people times and again.

Nobody (I hope!) is foolish enough to depend on hand-configured tools to
generate software that's to be released to customers. That's a formula
for utter abject failure. You *need* to make sure there's a
*reproducible*, *reliable* way to build your software *automatically*,
so that when a customer reports a problem in build 1234, you can
checkout version 1234 from version control and reproduce exactly the
binaries that's distributed to the customer, thereby be able to reliably
interpret stack traces, reproduce old bugs, etc..

It would really *really* suck if version 1234 was released before
somebody reconfigured some obscure IDE setting or external tool, and now
we don't remember how to build the same version 1234 that the customer
is running.

> Having the SCM handle such files will add redundant burden to it and
> even worse - can cause pointless merge conflicts.

That's why I said, auto-generated files should NOT be included in
version control. Unfortunately it's still being done here at my work,
and every now and then we have to deal with silly spurious merge
conflicts in addition to subtle ABI inconsistency bugs like I described
above.

T

-- 
It's amazing how careful choice of punctuation can leave you hanging: