Splitting std.algorithm

Tue Jan 20 15:40:57 PST 2015

OK, so the past weekend, I was foolhardy enough to attempt (again) to
split std.algorithm into more manageable pieces... and unlucky enough to
actually succeed this time:

	https://github.com/D-Programming-Language/phobos/pull/2879

However, right now this PR is in a rather precarious situation, because
git doesn't understand the concept of moving content between files; as a
result, *any* further changes to std/algorithm.d in master must be
manually merged into the branch (as in, the diffs must be applied by
hand). As we all (should) know, such by-hand merges are extremely
error-prone, and subtle bugs can get introduced inadvertently. A
careless mistake on my part could accidentally revert a bugfix PR, for
example.

Besides, such by-hand merges are also very time-consuming, and I really
rather do better things than to sit around manually applying diffs all
day, and all that without knowing whether or not this PR is going to get
merged at all.

So I have a request: can we please decide ASAP whether or not this PR is
worth it, and, if it is, merge it ASAP? Since we're currently in the
process of improving docs, there's an extremely high chance that
std/algorithm.d will be touched again in the near future. I've already
spent hours manually applying the diffs (and coaxing git to behave,
which is challenging in this situation) for a *single* affected PR that
was merged today, and I really do not want to keep doing this if at all
possible. If this is a bad idea, I'd like to know right now and not
waste any more time on it.

So far, the arguments for splitting std.algorithm are:

- The file is too big. It increases compilation time and compiler memory
  usage, and makes locating a particular piece of code more difficult
  than it needs to be.

- I can't even run the Phobos unittests on my machine because dmd runs
  out of memory and dies. This means either (1) I submit untested PRs so
  that I can leverage the autotester to run the tests for me, which
  means lots of wasted autotester resources and waste of time for me as
  I have to do the code-compile-test cycle on the autotester; or (2) I
  have to manually delete large swaths of code from std.algorithm while
  working on the PR, just so I can unittest my changes properly. I'm
  sure I'm not the only one here who has trouble running Phobos
  unittests; this means the barrier to contribution is needlessly high.

- It's a conglomeration of only tenuously-related functions, and as a
  result, importing std.algorithm will pull in roughly half of Phobos in
  dependencies that you may not actually need. In the process of
  splitting, I have found that I could eliminate many module-level
  imports, and/or otherwise reduce dependencies to other Phobos modules.

- Although this PR doesn't do this yet, having smaller submodules means
  that other Phobos modules that need something from std.algorithm won't
  have to import the entire thing (which in turn would cause a snowball
  effect of also importing every dependency of std.algorithm, and their
  respective recursive dependencies, most of which are unnecessary
  anyway, since it may be just 1 or 2 functions that are actually
  needed). With the new submodules, if you need map(), for example, you
  could just import std.algorithm.iteration : map, and you won't incur
  the cost of also importing stuff that only, say, cartesianProduct
  needs.

- An overly large module makes it difficult for new users to understand
  what the module does, or whether it happens to contain something they
  need. This is to some extent alleviated by proper documentation, but
  even then, functions categorized into 6 submodules is a lot more
  browseable than a single gigantic module that contains everything
  including the kitchen sink.

The only argument against splitting std.algorithm (that I know of) is:

- Andrei doesn't approve because apparently some people think "big files
  are not a problem".

So, what do you think? Should we merge this, or should we not?

T

-- 
Many open minds should be closed for repairs. -- K5 user