[phobos] Split std.datetime in two?

Steve Schveighoffer schveiguy at yahoo.com
Thu Feb 10 14:31:38 PST 2011


---- Original Message -----

> From:Don Clugston <dclugston at googlemail.com>
> On 10 February 2011 16:50, Steve Schveighoffer <schveiguy at yahoo.com> 
> wrote:
> > ----- Original Message -----
> >> From:Andrei Alexandrescu <andrei at erdani.com>
> >> std.datetime has 34219 lines, which accounts for over 26% of the entire 
> Phobos
> >> size. If Jonathan will (as he promised, I didn't forget :o)) fix 
> line sizes
> >> to conform to 80 columns, then std.datetime will become 40961 lines, or 
> straight
> >> 30% of Phobos.
> >>
> >> (This might have to do with the increase of "hello, world" 
> that was
> >> noted by some people on the compiler list.)
> >
> > I don't think so, but I cannot be sure. I'd say a full 75-90% of 
> std.datetime is unit tests or documentation.  That shouldn't increase the 
> size of the lib, and certainly not by as much as it does.
> 
> I'm almost certain that it is the cause, though this is not
> std.datetime's fault.
> Evidence from the map file shows that the executable size is roughly
> proportional to the number of lines of code in Phobos...
> The deeper cause is that the linker is failing miserably, it hardly
> discards anything.

I think the cause is DMD, I just ran some simple tests.

see here: http://d.puremagic.com/issues/show_bug.cgi?id=5560

Note, this is a small amount, but I'm sure there are other weird code inclusion things that dmd should just never put in in the first place.  Essentially, I think dmd shouldn't leave everything up to the linker to decide, it should trim out code it should not be compiling.

> 
> > If it is to blame, maybe there is something wrong with the compiler still 
> including unittest code for -lib or something...
> >
> >> I understand there are factors that contribute to that: date and time
> >> manipulation is a bulky endeavor, there's a ton of unittests, and
> >> there's a lot of documentation. But at a level I find it difficult 
> to digest
> >> the fact that in sheer numbers date and time manipulation accounts for 
> 30% of
> >> Phobos. As a comparison point, std.algorithm does arguably a lot of 
> work, has
> >> adequate documentation, and has unittest coverage at 95%, yet does all 
> that in a
> >> "measly" 8027 lines.
> >
> > Lines of file does not mean % of a library, especially when a large portion 
> of it is not compiled.  I think we need to stop this prejudice against 
> uncompiled LOC.  I fully support having unit tests next to the code being 
> tested, it's the whole point of the builtin unit test system in D.
> >
> > Bottom line, the doc generator should do a better job of generating 
> documentation, so we don't *have* to open the file, and if std.datetime is 
> adding too much binary to the exe, we should fix whatever problems dmd is likely 
> having there.
> 
> Yes, but on the other hand..
> (1) There has to be a maximum acceptable source file size.

Why?  OCD is not a good reason ;)

When developing a visual .net application, I add a button, and the ide just adds another function to handle its operation at the end of the file.  With super-complex applications this file gets to be thousands and thousands of lines long.  But it never bothers me because I don't have to think about it.  I don't care where it puts the function, I just care that there is a function attached to it.

File size should not be any problem, since the compiler should *correctly* trim out lines that are not compiled.  The issue is, and always should be, the compiled code.  If the code is bloated or covers too many loosely related concepts, it should be split into different modules.

I actually would have no problems splitting std.datetime into different modules, but certainly not because of the size of the code.  It's because someone might want to just deal with time without having to deal with calendars/dates.
 
> Personally I start to feel uncomfortable above 2000 lines, and get an
> uncontrollable urge to split at 5000 lines.  That's just me, but I
> suggest all modules should be short. And at 35000 lines,
> std.datetime.length > short.max.

I think there should be no limit on a module's size, it should cover a certain concept.  If the concepts contained within the file are too disjoint, then splitting is a good option.  But LOC should never be a factor.  The compiler handles it just fine, nobody should need to look at the file to use the lib (assuming the doc generator is good enough), there is no harm here.

> 
> (2) Actually, it seems that most of size actually comes because every
> test is written 'by hand'. If they were done as arrays [parameter1,
> parameter2, result]...
> with a loop, they'd be a lot shorter. (I crunched down the
> std.math.exp tests enormously by doing this). Looking at that module,
> I get the feeling that there's been a lot of cut-and-paste.

These are good ideas, but at the same time, the problem I have with unit tests are that often all you get is a line number where the failure occurs.  This makes loops work *very* poorly to determining what the failure is.  I understand there is some improvement on this front, hopefully it will make it easier to write loops.

> It is a little disconcerting if D really cannot write unittesting code
> concisely. If it really needs to be that big, that part of the
> language needs more work; or we need more helper functions. Or both.

Not necessarily.  A unit test is intended to be constructed and tested as a unit.  It necessarily is verbose and repetitive.  Generally (and I think this is the case for datetime), more unit tests == more coverage.  That's not a bad thing.

-Steve



      


More information about the phobos mailing list