[phobos] datetime review

Jonathan M Davis jmdavisProg at gmx.com
Sat Oct 9 17:57:37 PDT 2010


On Saturday 09 October 2010 12:27:26 Andrei Alexandrescu wrote:
> On 10/8/10 16:04 CDT, Jonathan M Davis wrote:
> > Okay. Here is my current datetime code for review.
> 
> Solid work. I'm on vacation and I gave it a thorough read last night.
> David's parallelism library would have come first, but that's not
> vacation reading...
> 
> A word of caution. I've been on the boost mailing list for a good while
> and they have the best review process ever. The result is invariably
> twofold: (a) a very good quality library; (b) a suicidal-depressive
> library author.

Totally understandable. I am by no means claiming that what I've done is 
perfect.

> 
> You need to expect to take quite a few hits from me and others, but
> please don't take it personally. Also, a word for the other reviewers:
> it's much easier (and therefore more luring) to bury a submission for
> its weaknesses instead of suggesting concrete steps to make it stronger.
> 
> High-level thoughts
> ===================
> 
> * At 40KLOC, the library is probably on a par with the first Netscape
> browser. This is a bit extravagant for a date and time library, and last
> thing we want is Phobos to become a collection of elaborate trivialities
> (I'm not saying date/time is a trivial matter, but of the 40KLOC there
> are probably only a couple thousand that do actual work, of which a
> couple hundred that do nontrivial work). We should look into ways to
> reduce its sheer size, and, most importantly, the size of its
> user-visible API. (More concrete suggestions to follow.)

Much of the file size is due to unit tests and documentation rather than then 
main code, so I think that it's more of a question of the size of the API rather 
than the file size.

> * The library has a fundamental inconsistency: it can't decide whether
> it goes with sheer "long" representation for various durations (days,
> months, hnseconds etc.) versus typed representations (MonthDuration,
> HNDuration, JointDuration).This hurts everybody everywhere: the library
> spends real estate to implement durations, and then the API explodes
> because it doesn't use them so it needs to be explicit about measurement
> units. For example, we have a hecatomb of addXxx functions: addDays,
> addWeeks, addThis, addThat, addTheOther. The design should have
> exploited typed durations and support += for durations. Then you write d
> += add(weeks(3)); and so on.

The main problem with using only durations to add is the AllowDayOverflow issue. 
You can deal with it with the add functions but not with operator overloads 
(though you cauld have a name function for adding durations which took an 
AllowDayOverflow). Also, at the moment, the durations use the addXXX functions to 
actually implement arithmetic with time points. There's also the issue of 
rollXXX which seems a bit funny to me to use with durations (it would have to 
use a named function instead of operator overloading regardless).

I could alter add and roll to take durations instead of longs and then make it 
so that there's only addDuration (instead of addTUnit()). Without a duration 
type per time unit type, however, you can't really roll time units with 
durations.

Java has add(int field, int amount) and roll(int field, int amount) functions, and 
I believe that at the time I stared putting in the addXXX() and rollXXX() 
functions, I was thinking that it was rather annoying to have to give it a 
number fo the time unit rather than just naming it, and so I ended up creating 
functions for each type that made since on a given time point and had addTUnit() 
and rollTUnit() for generic code. It did result in an awful lot of functions 
though, so I do agree that it's not exactly ideal as is.

I will have to look at what all of the implications are of removing them though.

> * Speaking of which, we come back at Walter's point: should we go with
> "long" for representing various durations and then make the meaning
> clear with function names, OR should we go with explicit strong types
> for times and durations? It's a good question, and arguments could be
> made for either case. Using "long" would probably trim away 60% of the
> code size. On the other hand, if we look at successful date and time
> libraries, we see that they do tend to be elaborate about granular
> types. So my suggestion is to not mess with success and go with
> elaborate typing. BUT I do want to cut a good fraction of sheer
> repetition by use of the language. AND I also want to not be able to
> find "long" in the library except in e.g. multiplication of a duration
> with a scalar and in the private definition of the typed abstractions.
> 
> * This brings me to one beef I have: the library is repetitive and
> tedious. This is because it makes very little use of generative and
> generic capabilities. Most of it could have been written in C with
> structs and functions. I'll follow up with concrete suggestions.

I ended up making it fairly easy to _use_ in generic code, I believe, but it 
doesn't do much to generate code, which does indeed result in a lot of generic 
and repetitive code.

> * The library uses at least one actively unrecommended technique (global
> alias for built-ins) and one questionable or at least
> not-yet-widely-used technique (use of a class as a namespace).
> Explanations follow.
> 
> Details
> =======
> 
> * core.d is by far the weakest part of the library. In fact, it being
> the first one I've read, I got biased early on against the library. I'm
> very glad I continued to read through.
> 
> * Global aliases of the form
> 
> alias short  Year;              /// Unit for holding a Gregorian year.
> 
> are a poor technique in C, C++, and D. I've read articles and rants
> about it in several places. In brief, the problem is that such aliases
> lure their user into thinking they are genuine types with the usual
> safeguards that types offer (such as disallowing illegal operations
> during compilation time and/or run time).
> 
> DayOfMonth dom;
> Month m;
> ...
> m += dom; // won't compile I guess?!?
> dom = 32; // throw I suppose?!?
> 
> Recommendation: eliminate these. At best make them private and give them
> the "Rep" (representation) suffix. But I suspect you don't even need them.

They were meant primarily to better document what a function took (like Year, 
MonthOfYear, DayOfMonth rather than int, int, int) and they restrict the range 
of what you can feed a function (like short instead of int), but they are rather 
ugly and using anything smaller than int gets a bit annoying to use as I found 
in the unit tests. It does have the advantage of making sure that the programmer 
restricts their values to the smaller size, but since the smaller size still 
isn't the exact range required for the given value (e.g. 1 - 12 for months or 1 
- 31 for days), it doesn't really help all that much anyway. It also creates 
greater confusion when various other functions (like addXXX()) take longs, 
though if I did it correctly, that it least manages to indicate when you use a 
value in a specific range rather than x amount of a time unit (e.g. the exact 
month vs the number of months to add). It certainly isn't the greatest as is 
though, and I'm more than open to fixing it.

The main one that I'm leary on fixing is StdTime (and AdjustedTime) since it has 
a specific meaning beyond long and it helps with the documentation. But ideally, 
programmers would use such functions very sparingly anyway, since SysTime wraps 
and deals with most of that for you.

> 
> * This enum:
> 
> enum Month : MonthOfYear { jan = 1, /// January.
>                             feb,     /// February.
>                             mar,     /// March.
>                             apr,     /// April.
>                             may,     /// May.
>                             jun,     /// June.
>                             jul,     /// July.
>                             aug,     /// August.
>                             sep,     /// September.
>                             oct,     /// October.
>                             nov,     /// November.
>                             dec      /// December.
>                           }
> 
> The doc comments are not needed. Just put "/// Month" for the first and
> "/// Ditto" for all others.
> 
> * Similar remark about comments to:
> 
> enum DayOfWeek : ushort { sunday = 0, /// Sunday.
>                            monday,     /// Monday.
>                            tuesday,    /// Tuesday.
>                            wednesday,  /// Wednesday.
>                            thursday,   /// Thursday.
>                            friday,     /// Friday.
>                            saturday    /// Saturday.
>                          }
> 
> The names are self-explanatory, no need to repeat them!

I added the individual enume comments at the last minute as soon as I realized 
that for some reason putting ddoc comments on the enum itself didn't result in 
the enum values being shown in the documentation (highly negative in my opinion 
- maybe I should add an enhancement request for that). The names _should_ 
definitely be self explanatory, but since I had to add comments on them for them 
to end up in the docs, and the docs put the space for the comment regardless of 
whether I put the comment, I just put the full names there. Personally, I'd much 
prefer to have

/++ ddoc comment for enum +/
enum Month : MonthOfYear {jan = 1, feb, mar, apr, may, jun, jul, aug, sep, oct, 
nov, dec }

in the file and have them listed that way in the generated ddoc without having a 
single line per enum value and without my having to put a comment on each one. 
Being _able_ to put a comment on each one is all well and good, but _having_ to 
is irritating.

> I also notice
> that (a) DayOfWeek uses ushort directly (why not ubyte?) instead of
> going through a global alias (like Month) that I'd recommend you remove
> anyway.

Hmm. I would have thought that I'd have used ubyte for DayOfWeek, but I guess 
that I didn't. I've found dealing with types smaller than int annoying enough 
though, that I may just make it so that only the interior representation uses 
the smaller types and the API ditches them (especially when I get rid of names 
like Year and Month).

> * These types suggest to me that we need to finally implement a Bounded
> type such that Bounded!(ubyte, 1, 7) would be a good representation for
> a day of week. We discussed Bounded a fair amount in the newsgroup a
> while ago. I think that could save a lot of code down the road (e.g. in
> duration.d), in addition to being a good library artifact of its own.

That would indeed be useful.l Restriting to smaller types isn't enough because 
their range is still too large. It's also _highly_ annoying when it comes to 
using them because of the necessary conversions. And since the conversions don't 
even do the job to get the value down to the exact range required, increasingly 
I don't think that they're really worth it.

> * This enum and its kin:
> 
> enum AllowDayOverflow
> {
>      /// Yes, allow day overflow.
>      yes,
> 
>      /// No, don't allow day overflow.
>      no
> }
> 
> should be like this:
> 
> enum AllowDayOverflow
> {
>      /// No, don't allow day overflow.
>      no,
>      /// Yes, allow day overflow.
>      yes
> }
> 
> such that code can write:
> 
> void fun(AllowDayOverflow allowDayOverflow) {
>     ...
>     if (allowDayOverflow) ...
>     ...
>     if (!allowDayOverflow) ...
> }
> 
> Right now writing such works but with the reverse effect.

Very good point. I didn't think of that.

> 
> * "this(string msg, string file = __FILE__, size_t line = __LINE__,
> Throwable next = null) nothrow" -> the __FILE__ and __LINE__ are useless
> (they are yours, not your caller's). You need to make them template
> parameters to work, see other places in Phobos. At least that's what I
> recall. Walter?

I was _sure_ that the default arguments were filled in at the call site... And 
they do! This program

----------- main module
import other;

void main()
{
    writeval();
}

--------- other module
import std.stdio;

void writeval(string file = __FILE__, size_t line = __LINE__)
{
    writefln("%s(%s)", file, line);
}

prints main.d(5)

Or am I misunderstanding what you're saying?

> * "final class TUnitConv" used as a namespace. Can't we organize things
> such that we introduce few enough names for such namespaces to be unneeded?

Well, with it seemed like a good way to organize some of the code, and when I 
originally created it, and I was looking to reduce how much was thrown into the 
namespace on importing datetime. And with a to!() function, it seemed even more 
critical to create such a namespace.

For Clock, Dur, and IRange, however (which are similar namespaces), I really do 
think that the added namespacing is well worth it. It makes the code much 
clearer. Dur in particular would be bad to have its functions separated out, 
because then you have functions like years(), months(), and days() which are 
going to conflict all over the place with variable  names.

> 
> * "static long to(TUnit tuFrom, TUnit tuTo)(long value)" is a spark of
> brilliance that I'd love to see a lot more of. It's essentially the
> difference between reading and memorizing 90 function names versus
> memorizing one function name and an enum with 10 values. That being
> said, the rest of the code sadly doesn't transform this spark into a
> full-blown explosion - it goes right back to triviality such as e.g.
> hnsecsToDays. Those are hardly needed in the implementation (use
> template constraints) and DEFINITELY not needed in the documentation.
> They just pollute it.

Yes. making the conversion functions which to!() uses private would be a good 
idea. I should have thought of that. Originally, I was just going to write them 
all, until I realized the cominatorial explosion of functions that that would 
cause and ended up creating to!() instead.

> Suggestions: (a) throw TUnitConv away; (b) define this and friends at
> global scope:
> 
> long convert(TUnit from, TUnit to)(long);

I'm not sure if convert is descriptive enough (TUnitConv.to!() has the distinct 
advantage of being quite explicit even if it isn't very pretty), but given how 
little is in TUnitConv, that's probably a good idea.

> That being said, convert() does go against the grain of elaborate typing
> for durations. I'd accept it as a low-level function, but really the
> library currently fosters elaborate types for durations... so there's a
> disconnect here.

The library itself needs it regardless. As for user code, well sometimes they 
may very well need to deal with converting seconds to minutes and the like, so I 
think that it's worthwhile to have in the API, but it's not something that I'd 
expect to be heavily used.

> * To be specific: throw away yearsToMonths et al. No need for them.

Well, they either need to exist internally or be inlined in TUnitConv.to!() (or 
whatever it gets renamed to), but I do agree that there's no need for them in 
the API.

> * I suggest even experimenting with strings instead of TUnit. Really
> TUnit is a vestige from C++ and Java, but we do accept string template
> parameters so how about
> 
> assert(convert!("years", "months")(2) == 24);
> 
> etc. Opinions?

My first concern would be performance, but if they're templates and template 
functions, then that isn't really an issue. Hmmm... My first reaction to that is 
definitely negative, but on further reflection, it is starting to look like at 
least a decent idea of not an outright good one. It _feels_ like there should be 
something wrong with it, but TUnit is effectily just a bunch of ints that have to 
be checked with template constraints, and this is just a bunch of strings that 
would have to be checked with template constraints. I'll have to think about it 
further, but it is starting to look like a good idea.

> * All string functions in core.d make me nervous because they hint to no
> support for locales, and are very pedestrian as they go. I'd prefer at
> least some sort of table with names for "year", "years" etc. in
> conjunction with reusable code, instead of rigid, nonreusable code with
> inline strings. Same about monthOfYearToString etc.

Understandale. I didn't really know a better what to do it since it's not like 
we have internationalzition support built into D or Phobos. Sure, we have 
_unicode_ support, but that obviously isn't the same thing. I'll have to think 
of a clean way to do that with a table of some kind.

> 
> * validXxx and validXxxOfYyy is in desperate need for a template
> approach a la to/convert. Virtually anything that encodes types in
> function names is such a candidate.

It does to some extent, though some of them (like for the day of the month) 
require rather different parameters than the rest. Still, that could be dealt 
with with a template specialization. There's also not currently any generic way 
to define what the values should be tested _against_, so I'll have to think of a 
good way to make that work. It does seem like a good idea though.

> * As another random example:
> 
> DayOfWeek getDayOfWeek(DayOfGregorianCal day) pure nothrow
> 
> could/should be (in client use):
> 
> get!(TUnit.dayOfWeek, TUnit.dayOfGregorianCal)(day);

Since those functions would all have to be have specialized version of such 
templates, I'm not sure how much you gain with that approach - particularly when 
there aren't very many of those functions (and IIIRC they're all package or 
private rather than in the API). But it may be worth doing something like that. 
I'm more skeptical of this suggestion than the others though.
 
> * Generally, it's clear from above why I was displeased with core.d. At
> 1952 lines, it's a lot of pain for very little gain. It encodes date
> artifacts in function names leading to an explosion of API functions. It
> uses too little generic mechanisms to reduce both API size and
> implementation size.

datetime.core was entirely intended to hold stuff that most of the other modules 
needed rather than be all that useful on its own (as I discussed in my responses 
to Walter, I really was looking for everything to be in one module, but the file 
was getting to be too large. I attempted to split it up, and I didn't 
necessarily do so in the best manner).

And it's mostly helper function stuff, so some of it wasn't necessarily 
originally intended to be in the public API, but I put it there at the last 
minute since it seemed like a user might find it useful (like the validXXX and 
enforceXXX functions).

> * On to duration.d. It's simple: there's no need for three durations.
> Rename JointDuration to Duration and throw everything else. I'd find
> myself hard pressed to produce situations in which efficiency
> considerations make it impossible to cope with a 128-bit representation.
> 
> * If you do want to go fine-grained durations without the aggravation,
> use templates. Then you can have Duration!HNSeconds, Duration!Seconds,
> Duration!Months, whatever you want, without inflicting pain on yourself
> and others. If you play your cards right, such durations would match
> 100% the templates you define in core.d and elsewhere so you don't need
> to make a lot of implementation effort. BUT again I think one duration
> is enough. Maximum two - short duration and duration.

I really don't think that this is going to work, and the problem is months and 
years (they frequently seem to be the problem) due to the fact that they don't 
have a uniform number of days. Originally, I had a Duration struct templated on 
TUnit, but it resulted in several problems - the biggest having to do with 
addition/subtraction.

You can't convert months or years to units smaller than months, and you can't 
convert anything smaller than a month to months or years without a specific date. 
So, you can't add a Duration!month and Duration!day no matter how it may seem 
like you should be able to. Many durationos _could_ be added - such as 
Duration!day and Duration!hour - but the variable sizes of months and years 
poses a big problem for that. To fix the problem, you need a duration which joins 
the two - hence JointDuration. That way, you can write user code like 
Dur.years(7) + Dur.days(2) and have it work. The way I'd originally done that 
resulted in a combinatorial explosion of joint duration types (it was effctively 
templated on the types of duraton that you'd added together to create it), so I 
just combined the months+ and sub-months durations into two types. It cuts down 
on the amount of generated code at least.

And your suggestion to just have JointDuration has some merit, but doesn't work 
with the current scheme. The problem is that a time point can (conceptually at 
least) cover any range of time units. It could be years only or just months and 
days, or it could be only seconds. Practically-speaking, at present, they all 
cover at minimum years - days, except for TimeOfDay. Right now, with duration 
types separating out months+ and sub-months, you can have the compiler enforce 
whether a particular duration can be added to a particular type of time point. 
So, if you try and add a MonthDuration to a TimeOfDay, it won't compile (since 
it doesn't have any months in it). If you tried to just have JointDuration, the 
compiler can no longer make such a guarantee, and you'd end up with run time 
errors instead of compile-time errors.

Now, we could say that we're only worried about the types of time points which 
are defined in std.datetime and are not worried about user-defined time point 
types, and then say that we don't care about adding months or years to a 
TimeOfDay because it really has no effect when you think about it (regardless of 
how many days that is), and then it would be possible to just have a 
JointDuration. But for better-or-worse, that would mean no user-defined time 
points, or at least, it would become problematic for them. Now, I'm not sure 
that we really care about that. In making the code deal with the multiple time 
point types that I defined, it was relatively easy to make it so that it would 
work with user-defined time points, so I did. That doesn't mean that we need to 
care or worry about that. Honestly, I would expect _very_ few programmers to 
even ever think about doing it, and the types that I've defined should cover all 
but fairly exotic requirements for time points. So, I could definitely look at 
reducing the durations down to a single JointDuration.

One problem with that, however, is that that could get really ugly for comparing 
durations (which is code that I noticed yesterday that I forgot to add - it's on 
the code on my machine now though). How do you compare a duration which holds 
both months and smaller units? You could have a duration which is 5 months and 
702 days and another which is 15 months and 2 hours. Which is larger? Sure, in 
this case, we can see that it's the first one, but once you get to something like 
3 months in one and 2 months and 30 days in the other, what do you do? Depending 
on which months we're talking about, either of them could be greater or they 
could be equal. And since a duration is used to indicate the difference between 
two time points, it's a given that programmers will want to compare them.

So, while I think that you have some good points, I'm not sure that what you 
suggest really improves things at all. I'm totally open to better suggestions, 
but the variable number of days in a month and a in a year is a very annoying 
problem here.

Also, it should be noted, that programmers would create all durations by either 
using arithmetic on type points or by using the functions in Dur - e.g. 
Dur.years(), Dur.days(), or Dur.seconds(), and that combined with auto 
eliminates most cases where the programmer needs to care about what the type of 
a duration actually is - the one glaring exception being a function parameter 
which would then likely need to be templatized. So, while the multiple duration 
types is annoying, I think that I've managed to asbstract the issue away for the 
most part.

Also, there's TickDuration (which was SHOO's Ticks), which I made to work as a 
duration (since it is one) but needs to stay separate for reasons of precision.  
Though I don't think that programmers using std.datetime will have much call for 
creating those directly.

> * timezone.d looks okay.
> 
> * At 23KLOC, timepoint.d better had a walkOnWater() function in there.
> 
> * currLocalTime, currUTC, currOtherTZ should be
> current!(TUnit.localTime), current!(TUnit.utc) - bringing again API
> consolidation into discussion.
 
Hmmm. Well, they're basically just a different time zone class, so it could 
become something like currTime(TimeZone tz = LocalTime.instance). I was trying 
to make it so that programmers who didn't want to worry about time zones didn't 
have to though, so whatever changes were made to reduce function duplication 
like this should try and make it so that the programmer can choose to not 
specificy the time zone and end up with local time.

> (need to finish this just about here)
> 
> * other.d - bad name. Like Walter, I think the physical design should be
> one .di file and one or more .d files.

Well, like I said before, the current module division isn't necessarily all that 
great, and what I'd really like is a single module from the perspective of the 
users of the module, so .di files seems like a good idea. I wasn't really aware 
of them before (I'm sure that I'd seen them mentioned before, but I haven't seen 
them discussed much, and I'm not particularly familiar with them), so I didn't 
think of that.

> * Duration and Interval should be consolidated.

As in the types or the modules? The types can't be cleanly consolidated because 
they're distinct concepts. As for the modules, I have no problem with that. It's 
just that Boost (and thus my code) is base on the three concepts of time points, 
time durations, and time intervals, so it seemed like a good way to split up the 
module.

- Jonathan M Davis


More information about the phobos mailing list