I just created a dub package. Frankly, the whole thign is backward.

Tue Apr 26 16:41:50 UTC 2022

On Wed, Apr 27, 2022 at 03:57:24AM +1200, rikki cattermole via Digitalmars-d wrote:
> On 27/04/2022 3:35 AM, H. S. Teoh wrote:
[...]
> > What I mean is this: my projects often involve a main executable,
> > which is the primary target of the project, plus several helpers,
> > which are either secondary targets sharing most of the same sources,
> > or code generators that create one or more targets required to
> > compile the main executable.  Occasionally, there may also be
> > auxiliary targets like HTML pages, procedurally-generated images,
> > and other resources.
[...]
> > As far as I know -- and if I'm wrong I'd be happy to be corrected --
> > dub is unable to handle the above (at least not natively -- I'd have
> > to write my own code for building the non-D parts of the build
> > AFAIK, which defeats the purpose of having a build system in the
> > first place).
> 
> Pre build commands.
> 
> For D stuff in dub something like this works fine.
> 
> "preBuildCommands": ["dub run package:tool -- args"]

Does this mean I have to create an entire subpackage just for this
purpose?  Or in fact, one subpackage per auxiliary target?  If so, that
would seem needlessly cumbersome for something that, in my mind, is a
trivial additional node in the build graph.

Also, treating these auxiliary build targets as second-class citizens
doesn't really sit right with me. I mean, after all, it all boils down
to "build sources S1, S2, ... into targets T1, T2, ... by running
command(s) C1, C2, ...".  What if I decide to insert a postprocessing
step in the middle of one of these build chains?  E.g., after creating a
HTML file, before installing it to the staging area, I decide that I
want to run a HTML tidying utility on it?  Does that mean I have to
create another subpackage to represent this extra step?

> But what you are describing is something automatic, which is not
> currently supported.

What do you mean by "automatic"?  These targets are generally not
automatically inferrable, i.e., I'm not expecting that if I say "build
xyz.html" dub would magically know that in order to build HTML files it
needs to compile a.d, b.d, c.d into abc.exe and run abc.exe on
xyz.template in order to produce xyz.html.  Obviously these build steps
must be explicitly stated somewhere.

But I do expect that build products generated by these steps would be
smoothly integrated into the build, i.e., if "code.template" is
preprocessed by some tool "helper.exe" to produce "code.d", then there
should be a way to compile "code.d" into the main executable as well.

[...]
> > - Network dependence (I'd *really* like for it *not* to depend on
> >   internet access being available by default, only when I ask it
> >   to).  IIRC there's some switch or option that does this, it would
> >   be nice if there was a local setting I could toggle to make this
> >   automatic.
> 
> https://dub.pm/settings
> 
> So yeah settings file already supports this.

Which setting disables network lookup by default?  Glancing at that
page, it's not obvious which setting it is and what value I should set
it to.

> > - Performance: is there an option to skip the expensive NP-complete
> >   dependency resolution step at the beginning for faster turnaround
> >   time? When I'm debugging something I do *not* want dub to do
> >   anything except recompile local source, no network access, no
> >   package dependency resolution, nothing, just *build* the darned
> >   thing and leave it at that.
> 
> I've had a look at this, it would take a good bit of refactoring to
> split this out into dub.selections.json *I think*.
> 
> But yeah you're right, if nothing has changed it should be cached.

Not just that, when I'm recompiling a project during debugging, I don't
want dub to look at the network *at all*.  I don't care if upstream has
released a critical zero-day exploit fix, I do NOT want the code to
suddenly change from under me when I'm trying to trace down a segfault.
I want it to just build the sources that are currently on the local
machine, and that's it.

Also, sometimes if I'm on the road without internet access, I do not
want to suddenly become unable to build my project.

> > - Reproducibility: if I change one source file out of a directory of
> >   50, I want the build system to be able to detect that one change,
> >   determine the *minimum* sequence of actions to update current
> >   targets, and run only those actions. After running these actions,
> >   the targets should be in EXACTLY the same state as if I had
> >   rebuilt the entire workspace from a clean checkout. And this
> >   should NOT be dependent on the current state of the workspace (it
> >   should know to overwrite stale intermediates, etc., so that the
> >   final targets are in the correct state).
> 
> I was questioning if the problem here is the compiler stuff, but its
> not.
> 
> However, I don't think that this should be the default. Processing all
> of those dates, caching them... yeah won't be cheap either.

Two comments here:

1) Dates should NOT be used as the basis for detecting changes, because
   it's not reliable. Preferably some kind of checksum should be used (a
   cheap one like md5 or CRC would do -- we don't need strong crypto
   strength here). Why?  Because sometimes, an updated timestamp does
   *not* mean the file actually changed.

   For example, if I `git checkout` a branch to look at something and
   switch back later, the file may have been touched during the switch,
   but afterwards its contents are identical to when it was last built.
   In this case, targets that depend on that file do not need to be
   rebuilt; it can be skipped entirely. This can sometimes lead to
   better performance, e.g., if a commonly-imported module is touched in
   this way, but you realize it hasn't actually changed, you can prune
   away large parts of the build graph.

2) The performance issue has already been solved, see for example:

	https://gittup.org/tup/

   The underlying idea is: *don't* scan the entire source tree to detect
   changes, use modern OS facilities (inotify, FileSystemWatcher, etc.)
   to let the OS tell you when something changes.  This allows the build
   time to be O(n), where n is the size of the change, rather than O(N)
   where N is the size of the workspace.  This is important for
   scalability to large projects where N is usually significantly larger
   than n.

T

-- 
An elephant: A mouse built to government specifications. -- Robert Heinlein