Splitting std.algorithm

H. S. Teoh via Digitalmars-d digitalmars-d at puremagic.com
Tue Jan 20 18:02:47 PST 2015


On Tue, Jan 20, 2015 at 05:26:41PM -0800, Jonathan M Davis via Digitalmars-d wrote:
> On Tuesday, January 20, 2015 16:10:26 Andrei Alexandrescu via Digitalmars-d wrote:
> > On 1/20/15 3:40 PM, H. S. Teoh via Digitalmars-d wrote:
> > > - Andrei doesn't approve because apparently some people think "big
> > > files are not a problem".
> >
> > cc Jonathan M. Davis, Steve Schveighoffer if I remember correctly :o).
> 
> I honestly think that many developers are overly interested in having
> small modules. Splitting stuff up too much causes maintenance problems
> (e.g. it becomes that much harder to find everything when there are a
> lot of files to look through, and it's that much less obvious which
> module something might live in),

Although I agree that we shouldn't be splitting up stuff just for the
sake of splitting it up, I don't agree that it's a problem to find
stuff. In this day and age, most software projects are too large to scan
through stuff manually; you'd use search tools like grep or IDE search
or whatever. Besides, a "real" editor like vim :-P is most useful when
you navigate via the search function instead of the directional/paging
keys anyway, and with D support in ctags, I don't see why splitting
things into smaller files would be a problem. That on its own doesn't
justify splitting, of course, but neither does it count against
splitting IMO.


> and in my experience, large modules like std.algorithm or std.datetime
> are actually quite maintainable.  However, that doesn't mean that we
> wouldn't be better off splitting up the particularly large ones. I
> just started on splitting std.datetime again the other day, and
> hopefully I can find time enough to finish it before I end up having
> to deal with merging other changes in.

std.datetime is one of those things that has grown large enough that
it's causing a noticeable pause when I open it in my editor or search
for a symbol... I think that's nearing the point where splitting just on
basis of size may become justifiable. :-P

(Having said that, though, std.datetime unittests actually compile and
run on my machine, in spite of their far larger number, yet
std.algorithm doesn't. I think it's because of too many deeply-nested
templates in std.algorithm, which probably includes a problem of my own
making, namely one of the overloads of cartesianProduct, that causes an
exponential number of recursive template instantiations. I've been
meaning to fix that, and have in fact managed to fix it for the finite
range case, but the infinite range case thus far eludes me.)


> As for std.algorithm, I think that the fact that the unit tests take
> up too much memory on some of the Phobos developers' machines is
> enough to merit at least looking at splitting it up.

Yeah, I haven't been able to run Phobos unittests locally for months
now (perhaps even a year?). I think that's pretty near the point of
being ridiculous.


> It's an actual, objective problem rather than a subjective one. And
> std.algorithm contains enough disparate functions that it certainly
> wouldn't hurt us to split it up from an organizational point of view
> either. So, if H. S. Teoh has managed to split it in a sane way, it
> makes good sense for us to look it over and merge it if it looks good.

I think the disparate functions part is probably the biggest reason to
split it. Lumping disparate functions into one file means all the
disparate dependencies of said functions also get lumped into one file,
so if you import X, you're also forced to import Z just because Y, which
you don't need, happens to sit in the same file as X and Y imports Z.
I'm pretty sure this tangled web of interdependencies between Phobos
modules is responsible for a significant proportion of complaints about
Phobos template bloat / excessive executable sizes. As well as the
somewhat amusing finding of mine some time ago (dunno if it's still
true) that importing std.algorithm (and not actually referencing
anything in it) will introduce a dependency on std.complex to your
program, even though you never use anything that might remotely need to
reference std.complex. I wasn't able to track down the source of this
issue before, because std.algorithm was just far too big to manage; but
perhaps after the split it will become more tractable.

It also introduces some inadvertent circular dependencies that can cause
hard-to-understand bugs, especially when conditional compilation is
involved. If you have two modules A and B, and A.x depends on B.x and
B.y depends on A.y, then you have a circular dependency between A and B
even though in actuality they *aren't* circularly dependent. But since
D's import granularity is the module, the circular dependency is there,
at which point it becomes a tricky thing to make sure things are
instantiated in the right order to resolve the apparent dependency loop,
otherwise a static if somewhere might fail where it shouldn't. (I've
seen this problem before but didn't have the patience to actually
unravel it down to the actual cause. Reducing the amount of gratuitous
dependencies would help a lot in making this easier to track down.)


[...]
> I had thought that the consensus was already that we should split
> std.algorithm at some point. The trick was spending the time to do it
> and get it right.
[...]

That's what I thought too, which is why I was a bit taken aback when
Andrei seemed to disapprove of the PR.


T

-- 
I think the conspiracy theorists are out to get us...


More information about the Digitalmars-d mailing list