what are guidelines for when to split a module into a package?

Thu Feb 22 07:31:11 UTC 2018

On Wednesday, February 21, 2018 23:13:33 Timothee Cour via Digitalmars-d 
wrote:
> from my perspective it makes sense to split a module M into submodules
> A, B when:
> * M is large
> * there's little interaction between A and B (eg only few symbols from
> A are needed in B and vice versa)
> * A and B are logically grouped (that is domain specific)
> * it doesn't turn into an extreme (1 function per module)
>
> Advantages of splitting:
> * easier to review
> * easier to edit (no need to scroll much to see entirety of module
> we're editing)
> * less pollution from top-level imports as they only affect submodule
> (likewise with top-level attributes etc)
> * more modular
> * doesn't affect existing code since `import M` will continue to work
> after M is split into a package
> * less memory when using separate compilation
> * allows fine-grained compiler options (eg we can compile B with `-O` if
> needed) * allows to run unittests just for A instead of M
> * allows selective import in client to avoid pulling in too many
> dependencies (see arguments that were made for std.range.primitives)
>
> Disadvantages of splitting:
> * more files; but not sure why that's a problem so long we don't go
> into extremes (eg 1 function per module or other things of bad taste)
>
> ---
> while working on https://github.com/dlang/phobos/pull/6178 I had
> initially split M:std.array into submodules:
> A:std.array.util (the old std.array) and B:std.array.static_array
> (everything added in the PR)
> IMO this made sense according to my above criteria (in this case there
> was 0 interaction between A and B), but the reviewers disagreed with
> the split.
>
> So, what are the guidelines?

It's decided on a case-by-case basis but is generally only done if the
module is quite large. std.array is not particularly large. It's less than
4000 lines, including unit tests and documentation, and it only has 18
top-level symbols.

Also, remember that within Phobos, imports are supposed to be as localized
as possible - both in terms of where the import is placed and in terms of
selective imports - e.g. it would be

import std.algorithm.searching : find;

not

import std.algorithm : find;

which means that splitting the module then requires that all of those
imports be even more specific. User code can choose to do that or not, but
it does make having modules split up further that much more tedious. Related
to that is the fact that anyone searching for these symbols now has more
modules to search through. So, finding symbols will be harder. Take
std.algorithm for instance. It was split, because it was getting large
enough that compiling it on machines without large amounts of memory
resulted in the compiler running out of memory. So, there was a very good
argument for splitting it. However, now, even if you know that a symbol is
in std.algorithm, do you know where in std.algorithm it is? Some are obvious
- e.g. sort is in std.algorithm.sorting. However, others are not so
obviously - e.g. where does startsWith live? Arguably, it could go in either
std.algorithm.comparison or std.algorithm.searching. It turns out that it's
in std.algorithm.searching, but I generally have to look it up. And where to
functions like map or filter live? std.algorithm.mutation?
std.algorithm.iteration? It's not necessarily obvious at all.

>From the perspective of users trying to find stuff, splitting modules up
comes at a real cost, and I honestly don't understand why some folks are in
a hurry to make module really small. That means more import statements when
using those modules, and it means that it's harder to find symbols.

Personally, I think that we should be very slow to consider splitting
modules and only do so when it's clear that there's a real need, and
std.array is nowhere near that level.

- Jonathan M Davis