[phobos] datetime review
Jonathan M Davis
jmdavisProg at gmx.com
Sat Oct 9 17:57:37 PDT 2010
On Saturday 09 October 2010 12:27:26 Andrei Alexandrescu wrote:
> On 10/8/10 16:04 CDT, Jonathan M Davis wrote:
> > Okay. Here is my current datetime code for review.
>
> Solid work. I'm on vacation and I gave it a thorough read last night.
> David's parallelism library would have come first, but that's not
> vacation reading...
>
> A word of caution. I've been on the boost mailing list for a good while
> and they have the best review process ever. The result is invariably
> twofold: (a) a very good quality library; (b) a suicidal-depressive
> library author.
Totally understandable. I am by no means claiming that what I've done is
perfect.
>
> You need to expect to take quite a few hits from me and others, but
> please don't take it personally. Also, a word for the other reviewers:
> it's much easier (and therefore more luring) to bury a submission for
> its weaknesses instead of suggesting concrete steps to make it stronger.
>
> High-level thoughts
> ===================
>
> * At 40KLOC, the library is probably on a par with the first Netscape
> browser. This is a bit extravagant for a date and time library, and last
> thing we want is Phobos to become a collection of elaborate trivialities
> (I'm not saying date/time is a trivial matter, but of the 40KLOC there
> are probably only a couple thousand that do actual work, of which a
> couple hundred that do nontrivial work). We should look into ways to
> reduce its sheer size, and, most importantly, the size of its
> user-visible API. (More concrete suggestions to follow.)
Much of the file size is due to unit tests and documentation rather than then
main code, so I think that it's more of a question of the size of the API rather
than the file size.
> * The library has a fundamental inconsistency: it can't decide whether
> it goes with sheer "long" representation for various durations (days,
> months, hnseconds etc.) versus typed representations (MonthDuration,
> HNDuration, JointDuration).This hurts everybody everywhere: the library
> spends real estate to implement durations, and then the API explodes
> because it doesn't use them so it needs to be explicit about measurement
> units. For example, we have a hecatomb of addXxx functions: addDays,
> addWeeks, addThis, addThat, addTheOther. The design should have
> exploited typed durations and support += for durations. Then you write d
> += add(weeks(3)); and so on.
The main problem with using only durations to add is the AllowDayOverflow issue.
You can deal with it with the add functions but not with operator overloads
(though you cauld have a name function for adding durations which took an
AllowDayOverflow). Also, at the moment, the durations use the addXXX functions to
actually implement arithmetic with time points. There's also the issue of
rollXXX which seems a bit funny to me to use with durations (it would have to
use a named function instead of operator overloading regardless).
I could alter add and roll to take durations instead of longs and then make it
so that there's only addDuration (instead of addTUnit()). Without a duration
type per time unit type, however, you can't really roll time units with
durations.
Java has add(int field, int amount) and roll(int field, int amount) functions, and
I believe that at the time I stared putting in the addXXX() and rollXXX()
functions, I was thinking that it was rather annoying to have to give it a
number fo the time unit rather than just naming it, and so I ended up creating
functions for each type that made since on a given time point and had addTUnit()
and rollTUnit() for generic code. It did result in an awful lot of functions
though, so I do agree that it's not exactly ideal as is.
I will have to look at what all of the implications are of removing them though.
> * Speaking of which, we come back at Walter's point: should we go with
> "long" for representing various durations and then make the meaning
> clear with function names, OR should we go with explicit strong types
> for times and durations? It's a good question, and arguments could be
> made for either case. Using "long" would probably trim away 60% of the
> code size. On the other hand, if we look at successful date and time
> libraries, we see that they do tend to be elaborate about granular
> types. So my suggestion is to not mess with success and go with
> elaborate typing. BUT I do want to cut a good fraction of sheer
> repetition by use of the language. AND I also want to not be able to
> find "long" in the library except in e.g. multiplication of a duration
> with a scalar and in the private definition of the typed abstractions.
>
> * This brings me to one beef I have: the library is repetitive and
> tedious. This is because it makes very little use of generative and
> generic capabilities. Most of it could have been written in C with
> structs and functions. I'll follow up with concrete suggestions.
I ended up making it fairly easy to _use_ in generic code, I believe, but it
doesn't do much to generate code, which does indeed result in a lot of generic
and repetitive code.
> * The library uses at least one actively unrecommended technique (global
> alias for built-ins) and one questionable or at least
> not-yet-widely-used technique (use of a class as a namespace).
> Explanations follow.
>
> Details
> =======
>
> * core.d is by far the weakest part of the library. In fact, it being
> the first one I've read, I got biased early on against the library. I'm
> very glad I continued to read through.
>
> * Global aliases of the form
>
> alias short Year; /// Unit for holding a Gregorian year.
>
> are a poor technique in C, C++, and D. I've read articles and rants
> about it in several places. In brief, the problem is that such aliases
> lure their user into thinking they are genuine types with the usual
> safeguards that types offer (such as disallowing illegal operations
> during compilation time and/or run time).
>
> DayOfMonth dom;
> Month m;
> ...
> m += dom; // won't compile I guess?!?
> dom = 32; // throw I suppose?!?
>
> Recommendation: eliminate these. At best make them private and give them
> the "Rep" (representation) suffix. But I suspect you don't even need them.
They were meant primarily to better document what a function took (like Year,
MonthOfYear, DayOfMonth rather than int, int, int) and they restrict the range
of what you can feed a function (like short instead of int), but they are rather
ugly and using anything smaller than int gets a bit annoying to use as I found
in the unit tests. It does have the advantage of making sure that the programmer
restricts their values to the smaller size, but since the smaller size still
isn't the exact range required for the given value (e.g. 1 - 12 for months or 1
- 31 for days), it doesn't really help all that much anyway. It also creates
greater confusion when various other functions (like addXXX()) take longs,
though if I did it correctly, that it least manages to indicate when you use a
value in a specific range rather than x amount of a time unit (e.g. the exact
month vs the number of months to add). It certainly isn't the greatest as is
though, and I'm more than open to fixing it.
The main one that I'm leary on fixing is StdTime (and AdjustedTime) since it has
a specific meaning beyond long and it helps with the documentation. But ideally,
programmers would use such functions very sparingly anyway, since SysTime wraps
and deals with most of that for you.
>
> * This enum:
>
> enum Month : MonthOfYear { jan = 1, /// January.
> feb, /// February.
> mar, /// March.
> apr, /// April.
> may, /// May.
> jun, /// June.
> jul, /// July.
> aug, /// August.
> sep, /// September.
> oct, /// October.
> nov, /// November.
> dec /// December.
> }
>
> The doc comments are not needed. Just put "/// Month" for the first and
> "/// Ditto" for all others.
>
> * Similar remark about comments to:
>
> enum DayOfWeek : ushort { sunday = 0, /// Sunday.
> monday, /// Monday.
> tuesday, /// Tuesday.
> wednesday, /// Wednesday.
> thursday, /// Thursday.
> friday, /// Friday.
> saturday /// Saturday.
> }
>
> The names are self-explanatory, no need to repeat them!
I added the individual enume comments at the last minute as soon as I realized
that for some reason putting ddoc comments on the enum itself didn't result in
the enum values being shown in the documentation (highly negative in my opinion
- maybe I should add an enhancement request for that). The names _should_
definitely be self explanatory, but since I had to add comments on them for them
to end up in the docs, and the docs put the space for the comment regardless of
whether I put the comment, I just put the full names there. Personally, I'd much
prefer to have
/++ ddoc comment for enum +/
enum Month : MonthOfYear {jan = 1, feb, mar, apr, may, jun, jul, aug, sep, oct,
nov, dec }
in the file and have them listed that way in the generated ddoc without having a
single line per enum value and without my having to put a comment on each one.
Being _able_ to put a comment on each one is all well and good, but _having_ to
is irritating.
> I also notice
> that (a) DayOfWeek uses ushort directly (why not ubyte?) instead of
> going through a global alias (like Month) that I'd recommend you remove
> anyway.
Hmm. I would have thought that I'd have used ubyte for DayOfWeek, but I guess
that I didn't. I've found dealing with types smaller than int annoying enough
though, that I may just make it so that only the interior representation uses
the smaller types and the API ditches them (especially when I get rid of names
like Year and Month).
> * These types suggest to me that we need to finally implement a Bounded
> type such that Bounded!(ubyte, 1, 7) would be a good representation for
> a day of week. We discussed Bounded a fair amount in the newsgroup a
> while ago. I think that could save a lot of code down the road (e.g. in
> duration.d), in addition to being a good library artifact of its own.
That would indeed be useful.l Restriting to smaller types isn't enough because
their range is still too large. It's also _highly_ annoying when it comes to
using them because of the necessary conversions. And since the conversions don't
even do the job to get the value down to the exact range required, increasingly
I don't think that they're really worth it.
> * This enum and its kin:
>
> enum AllowDayOverflow
> {
> /// Yes, allow day overflow.
> yes,
>
> /// No, don't allow day overflow.
> no
> }
>
> should be like this:
>
> enum AllowDayOverflow
> {
> /// No, don't allow day overflow.
> no,
> /// Yes, allow day overflow.
> yes
> }
>
> such that code can write:
>
> void fun(AllowDayOverflow allowDayOverflow) {
> ...
> if (allowDayOverflow) ...
> ...
> if (!allowDayOverflow) ...
> }
>
> Right now writing such works but with the reverse effect.
Very good point. I didn't think of that.
>
> * "this(string msg, string file = __FILE__, size_t line = __LINE__,
> Throwable next = null) nothrow" -> the __FILE__ and __LINE__ are useless
> (they are yours, not your caller's). You need to make them template
> parameters to work, see other places in Phobos. At least that's what I
> recall. Walter?
I was _sure_ that the default arguments were filled in at the call site... And
they do! This program
----------- main module
import other;
void main()
{
writeval();
}
--------- other module
import std.stdio;
void writeval(string file = __FILE__, size_t line = __LINE__)
{
writefln("%s(%s)", file, line);
}
prints main.d(5)
Or am I misunderstanding what you're saying?
> * "final class TUnitConv" used as a namespace. Can't we organize things
> such that we introduce few enough names for such namespaces to be unneeded?
Well, with it seemed like a good way to organize some of the code, and when I
originally created it, and I was looking to reduce how much was thrown into the
namespace on importing datetime. And with a to!() function, it seemed even more
critical to create such a namespace.
For Clock, Dur, and IRange, however (which are similar namespaces), I really do
think that the added namespacing is well worth it. It makes the code much
clearer. Dur in particular would be bad to have its functions separated out,
because then you have functions like years(), months(), and days() which are
going to conflict all over the place with variable names.
>
> * "static long to(TUnit tuFrom, TUnit tuTo)(long value)" is a spark of
> brilliance that I'd love to see a lot more of. It's essentially the
> difference between reading and memorizing 90 function names versus
> memorizing one function name and an enum with 10 values. That being
> said, the rest of the code sadly doesn't transform this spark into a
> full-blown explosion - it goes right back to triviality such as e.g.
> hnsecsToDays. Those are hardly needed in the implementation (use
> template constraints) and DEFINITELY not needed in the documentation.
> They just pollute it.
Yes. making the conversion functions which to!() uses private would be a good
idea. I should have thought of that. Originally, I was just going to write them
all, until I realized the cominatorial explosion of functions that that would
cause and ended up creating to!() instead.
> Suggestions: (a) throw TUnitConv away; (b) define this and friends at
> global scope:
>
> long convert(TUnit from, TUnit to)(long);
I'm not sure if convert is descriptive enough (TUnitConv.to!() has the distinct
advantage of being quite explicit even if it isn't very pretty), but given how
little is in TUnitConv, that's probably a good idea.
> That being said, convert() does go against the grain of elaborate typing
> for durations. I'd accept it as a low-level function, but really the
> library currently fosters elaborate types for durations... so there's a
> disconnect here.
The library itself needs it regardless. As for user code, well sometimes they
may very well need to deal with converting seconds to minutes and the like, so I
think that it's worthwhile to have in the API, but it's not something that I'd
expect to be heavily used.
> * To be specific: throw away yearsToMonths et al. No need for them.
Well, they either need to exist internally or be inlined in TUnitConv.to!() (or
whatever it gets renamed to), but I do agree that there's no need for them in
the API.
> * I suggest even experimenting with strings instead of TUnit. Really
> TUnit is a vestige from C++ and Java, but we do accept string template
> parameters so how about
>
> assert(convert!("years", "months")(2) == 24);
>
> etc. Opinions?
My first concern would be performance, but if they're templates and template
functions, then that isn't really an issue. Hmmm... My first reaction to that is
definitely negative, but on further reflection, it is starting to look like at
least a decent idea of not an outright good one. It _feels_ like there should be
something wrong with it, but TUnit is effectily just a bunch of ints that have to
be checked with template constraints, and this is just a bunch of strings that
would have to be checked with template constraints. I'll have to think about it
further, but it is starting to look like a good idea.
> * All string functions in core.d make me nervous because they hint to no
> support for locales, and are very pedestrian as they go. I'd prefer at
> least some sort of table with names for "year", "years" etc. in
> conjunction with reusable code, instead of rigid, nonreusable code with
> inline strings. Same about monthOfYearToString etc.
Understandale. I didn't really know a better what to do it since it's not like
we have internationalzition support built into D or Phobos. Sure, we have
_unicode_ support, but that obviously isn't the same thing. I'll have to think
of a clean way to do that with a table of some kind.
>
> * validXxx and validXxxOfYyy is in desperate need for a template
> approach a la to/convert. Virtually anything that encodes types in
> function names is such a candidate.
It does to some extent, though some of them (like for the day of the month)
require rather different parameters than the rest. Still, that could be dealt
with with a template specialization. There's also not currently any generic way
to define what the values should be tested _against_, so I'll have to think of a
good way to make that work. It does seem like a good idea though.
> * As another random example:
>
> DayOfWeek getDayOfWeek(DayOfGregorianCal day) pure nothrow
>
> could/should be (in client use):
>
> get!(TUnit.dayOfWeek, TUnit.dayOfGregorianCal)(day);
Since those functions would all have to be have specialized version of such
templates, I'm not sure how much you gain with that approach - particularly when
there aren't very many of those functions (and IIIRC they're all package or
private rather than in the API). But it may be worth doing something like that.
I'm more skeptical of this suggestion than the others though.
> * Generally, it's clear from above why I was displeased with core.d. At
> 1952 lines, it's a lot of pain for very little gain. It encodes date
> artifacts in function names leading to an explosion of API functions. It
> uses too little generic mechanisms to reduce both API size and
> implementation size.
datetime.core was entirely intended to hold stuff that most of the other modules
needed rather than be all that useful on its own (as I discussed in my responses
to Walter, I really was looking for everything to be in one module, but the file
was getting to be too large. I attempted to split it up, and I didn't
necessarily do so in the best manner).
And it's mostly helper function stuff, so some of it wasn't necessarily
originally intended to be in the public API, but I put it there at the last
minute since it seemed like a user might find it useful (like the validXXX and
enforceXXX functions).
> * On to duration.d. It's simple: there's no need for three durations.
> Rename JointDuration to Duration and throw everything else. I'd find
> myself hard pressed to produce situations in which efficiency
> considerations make it impossible to cope with a 128-bit representation.
>
> * If you do want to go fine-grained durations without the aggravation,
> use templates. Then you can have Duration!HNSeconds, Duration!Seconds,
> Duration!Months, whatever you want, without inflicting pain on yourself
> and others. If you play your cards right, such durations would match
> 100% the templates you define in core.d and elsewhere so you don't need
> to make a lot of implementation effort. BUT again I think one duration
> is enough. Maximum two - short duration and duration.
I really don't think that this is going to work, and the problem is months and
years (they frequently seem to be the problem) due to the fact that they don't
have a uniform number of days. Originally, I had a Duration struct templated on
TUnit, but it resulted in several problems - the biggest having to do with
addition/subtraction.
You can't convert months or years to units smaller than months, and you can't
convert anything smaller than a month to months or years without a specific date.
So, you can't add a Duration!month and Duration!day no matter how it may seem
like you should be able to. Many durationos _could_ be added - such as
Duration!day and Duration!hour - but the variable sizes of months and years
poses a big problem for that. To fix the problem, you need a duration which joins
the two - hence JointDuration. That way, you can write user code like
Dur.years(7) + Dur.days(2) and have it work. The way I'd originally done that
resulted in a combinatorial explosion of joint duration types (it was effctively
templated on the types of duraton that you'd added together to create it), so I
just combined the months+ and sub-months durations into two types. It cuts down
on the amount of generated code at least.
And your suggestion to just have JointDuration has some merit, but doesn't work
with the current scheme. The problem is that a time point can (conceptually at
least) cover any range of time units. It could be years only or just months and
days, or it could be only seconds. Practically-speaking, at present, they all
cover at minimum years - days, except for TimeOfDay. Right now, with duration
types separating out months+ and sub-months, you can have the compiler enforce
whether a particular duration can be added to a particular type of time point.
So, if you try and add a MonthDuration to a TimeOfDay, it won't compile (since
it doesn't have any months in it). If you tried to just have JointDuration, the
compiler can no longer make such a guarantee, and you'd end up with run time
errors instead of compile-time errors.
Now, we could say that we're only worried about the types of time points which
are defined in std.datetime and are not worried about user-defined time point
types, and then say that we don't care about adding months or years to a
TimeOfDay because it really has no effect when you think about it (regardless of
how many days that is), and then it would be possible to just have a
JointDuration. But for better-or-worse, that would mean no user-defined time
points, or at least, it would become problematic for them. Now, I'm not sure
that we really care about that. In making the code deal with the multiple time
point types that I defined, it was relatively easy to make it so that it would
work with user-defined time points, so I did. That doesn't mean that we need to
care or worry about that. Honestly, I would expect _very_ few programmers to
even ever think about doing it, and the types that I've defined should cover all
but fairly exotic requirements for time points. So, I could definitely look at
reducing the durations down to a single JointDuration.
One problem with that, however, is that that could get really ugly for comparing
durations (which is code that I noticed yesterday that I forgot to add - it's on
the code on my machine now though). How do you compare a duration which holds
both months and smaller units? You could have a duration which is 5 months and
702 days and another which is 15 months and 2 hours. Which is larger? Sure, in
this case, we can see that it's the first one, but once you get to something like
3 months in one and 2 months and 30 days in the other, what do you do? Depending
on which months we're talking about, either of them could be greater or they
could be equal. And since a duration is used to indicate the difference between
two time points, it's a given that programmers will want to compare them.
So, while I think that you have some good points, I'm not sure that what you
suggest really improves things at all. I'm totally open to better suggestions,
but the variable number of days in a month and a in a year is a very annoying
problem here.
Also, it should be noted, that programmers would create all durations by either
using arithmetic on type points or by using the functions in Dur - e.g.
Dur.years(), Dur.days(), or Dur.seconds(), and that combined with auto
eliminates most cases where the programmer needs to care about what the type of
a duration actually is - the one glaring exception being a function parameter
which would then likely need to be templatized. So, while the multiple duration
types is annoying, I think that I've managed to asbstract the issue away for the
most part.
Also, there's TickDuration (which was SHOO's Ticks), which I made to work as a
duration (since it is one) but needs to stay separate for reasons of precision.
Though I don't think that programmers using std.datetime will have much call for
creating those directly.
> * timezone.d looks okay.
>
> * At 23KLOC, timepoint.d better had a walkOnWater() function in there.
>
> * currLocalTime, currUTC, currOtherTZ should be
> current!(TUnit.localTime), current!(TUnit.utc) - bringing again API
> consolidation into discussion.
Hmmm. Well, they're basically just a different time zone class, so it could
become something like currTime(TimeZone tz = LocalTime.instance). I was trying
to make it so that programmers who didn't want to worry about time zones didn't
have to though, so whatever changes were made to reduce function duplication
like this should try and make it so that the programmer can choose to not
specificy the time zone and end up with local time.
> (need to finish this just about here)
>
> * other.d - bad name. Like Walter, I think the physical design should be
> one .di file and one or more .d files.
Well, like I said before, the current module division isn't necessarily all that
great, and what I'd really like is a single module from the perspective of the
users of the module, so .di files seems like a good idea. I wasn't really aware
of them before (I'm sure that I'd seen them mentioned before, but I haven't seen
them discussed much, and I'm not particularly familiar with them), so I didn't
think of that.
> * Duration and Interval should be consolidated.
As in the types or the modules? The types can't be cleanly consolidated because
they're distinct concepts. As for the modules, I have no problem with that. It's
just that Boost (and thus my code) is base on the three concepts of time points,
time durations, and time intervals, so it seemed like a good way to split up the
module.
- Jonathan M Davis
More information about the phobos
mailing list