std.compress

Thu Jun 6 07:26:49 PDT 2013

On Wednesday, 5 June 2013 at 18:21:04 UTC, H. S. Teoh wrote:
> On Wed, Jun 05, 2013 at 01:20:48PM -0400, Jonathan M Davis 
> wrote:
>> On Wednesday, June 05, 2013 14:02:37 Jakob Ovrum wrote:
>> > We have a standard library in disagreement with the 
>> > language's
>> > encapsulation mechanics. The module/package system in D is 
>> > almost
>> > ignored in Phobos (and that's probably why the package system
>> > still has all these little things needing ironing out). It 
>> > seems
>> > to owe influence to typical C and C++ library structure, 
>> > which is
>> > simply suboptimal in D's module system.
>> 
>> I honestly don't see how Phobos is in disagreement with the 
>> module
>> system. No, it doesn't use hierarchy as much as it should, and 
>> there
>> are a few modules that are overly large (like std.algorithm or
>> std.datetime), but for the most part, I don't see any problem 
>> with its
>> level of encapsulation. It's mainly just its organization 
>> which could
>> have been better. My primary objection here is that it seems
>> ridiculous to me create lots of tiny modules. I hate how Java 
>> does
>> that sort of thing, but there you're _forced_ to in many cases,
>> whereas we have the opportunity to actually group things 
>> together in a
>> single module where appropriate. And having whole modules with 
>> only
>> one or two functions is way too small IMHO, and that seems to 
>> be what
>> we're proposing here.
> [...]
>
> As Andrei pointed out, I think we need to look at this not from 
> a size
> perspective (number of lines, number of functions, etc.), but 
> from an
> API perspective: do these functions/structs belong together, or 
> are they
> only marginally related? More precisely, if some user code uses 
> function
> X, is that code equally likely to also use Y? Are there common 
> use cases
> in which only Y is used, not X?
>
> If the use of function X almost always implies the use of 
> function Y
> (and vice versa), then they belong in the same module. 
> Otherwise, I'd
> say they are candidates for splitting up.
>
> If function X uses function Z, and function Y also uses 
> function Z, but
> the use of X does not necessarily imply the use of Y (and vice 
> versa),
> then I'd argue that X, Y, and Z should be in separate modules to
> maximize reuse and reduce the amount of code you have to pull 
> in (you
> shouldn't be forced to pull in Z just because you use X which 
> calls Y,
> which Z happens to also call).
>
> This may be a bit heavy-handed for user code, but for Phobos, 
> the
> standard library, I think the bar should be set higher. After 
> all, one
> of the stated goals of Phobos is that you shouldn't need to 
> pull in a
> whole ton of code just because you call a single function. 
> Right now I
> think we're a bit short of that goal.

Massive +1

Modules are for grouping functions/types that are commonly used 
together or have interdependencies, not for grouping things that 
are in a similar category (although these things can be related).

I don't care if levenshteinDistance is a "classic algorithm", I 
don't want to have to compile it every time I want to take the 
minimum of two numbers. Barely anyone is ever going to use it, so 
it should be off in a module on its own.

There's absolutely nothing wrong with having lots of small 
modules provided that you don't end up importing the same sets of 
modules over and over. There are numerous advantages:

1. Makes it easier to manage dependencies.
1a. reduces compile times.
1b. reduces binary size.
1c. benefits incremental and distributed/parallel compilation.
2. Makes version control easier as more files means merge 
conflicts are less likely.
3. Makes it easier to navigate files.

The only downside is that you may occasionally have to import 
more modules.