[phobos] datetime review (new attempt at URL)

Steve Schveighoffer schveiguy at yahoo.com
Thu Oct 14 19:04:12 PDT 2010




----- Original Message ----
> From: Jonathan M Davis <jmdavisProg at gmx.com>
> To: phobos at puremagic.com
> Sent: Thu, October 14, 2010 7:07:34 PM
> Subject: Re: [phobos] datetime review (new attempt at URL)
> 
> On Thursday, October 14, 2010 12:38:57 Steve Schveighoffer wrote:
> > 1. Are  you going to use an extended Gregorian calendar, or use a Julian
> >  calendar for the appropriate dates?  I recommend using extended  Gregorian,
> > because it's much easier to deal with, and anyone who wants  to deal with
> > such historic accuracy should be using a much more complete  library.  In
> > any case, it should be noted what you are  using.
> 
> It uses the Proleptic Gregorian Calender (so, it uses the  Gregorian Calender 
> calculations for its whole length), and it follows ISO  8601 by using 0 for 1 
> B.C. It's mentioned in the ddoc comments.

Yes, this is what I meant (except when I did it for Tango, I used 1 B.C. for the 
year before 1 A.D.).  I think I even used the term proleptic, but for some 
reason in my head I thought it was called extended :)

> 
> > 
> > 2. I highly recommend ignoring the concept of leap-seconds, as it just  adds
> > constant maintenance (since leap seconds cannot be predicted) and  doesn't
> > add much to the library.  However, it should be noted  whether you support
> > them.
> 
> PosixTimeZone will support them if you  use one of the time zones that has them 
>
> (the ones starting with right/ I  believe), but that's it. Since, they come 
>from 
>
> the tz files, no maintenance  is required. Whether LocalTime or UTC uses leap 
> seconds is completely  system-dependent, but I wouldn't expect them to (and I 
> know that they won't  on posix systems since posix ignores leap seconds). So, 
> truth be told, the  right/ time zones will act differently if used with 
> PosixTimeZone than if  you were to have your system time using them.

What I mean was, if you subtract two points in time that cross a leap-second 
boundary, will you take into account the extra second?

> 
> > 
> > ----
> > 
> > I don't like all the aliases for Year, etc.  This accomplishes  almost
> > nothing except documentation.  And even that doesn't add  much.  I don't
> > see the point of doing:
> > 
> > int  foo(Year years)
> > 
> > vs.
> > 
> > int foo(short  years)
> > 
> > Both seem equally documented to me.
> 
> All such  aliases have been removed at Andrei's request, though I think that 
>one 
>
> of  the main reasons that I used them originally was because he'd put them in 
> std.gregorian. Discussion on that is what led to the discussion of a bounded 
> integral type (which I'm not worrying about at this point, but perhaps we  can 

> change std.datetime to use it later).

One of the benefits of using a single value to represent a duration/point in 
time is everything gets normalized.  For example, if ask for a date/time that 
represents 13/40/2010 at 33 o'clock, it just normalizes out to the correct point 
in time.

> 
> > 
> > ----
> > 
> > I like the to!(TUnit, TUnit) conversion function, but I think it  is
> > redundant to also have daysTohnsecs (btw, this isn't properly cased,  I
> > think it should have been daysToHnsecs).  I see the first uses  the other,
> > but a better implementation is possible without requiring all  the others. 
> > Also, you risk unnecessary truncation in your  calculations.
> > 
> > I'd say get rid of all the extra functions.   I would also renumber the enum
> > for TUnit to go from smallest to largest,  and I would rewrite the to
> > conversions as:
> > 
> > template  hnsecPer!(TUnit un) if(TUnit >= TUnit.week) // note reverse this
> > if  you reorder enum
> > {
> >     static if(un ==  TUnit.hnsec)
> >         enum hnsecPer =  1L;
> >     else static if(un == TUnit.usec)
> >          enum hnsecPer = 10L;
> >     else static if(un ==  TUnit.msec)
> >         enum hnsecPer = 1000 *  hnsecPer!TUnit.usec;
> >     else static if(un ==  TUnit.second)
> >         enum hnsecPer = 1000 *  hnsecPer!TUnit.msec;
> >     else static if(un ==  TUnit.minute)
> >         enum hnsecPer = 60 *  hnsecPer!TUnit.second;
> >     else static if(un ==  TUnit.hour)
> >         enum hnsecPer = 60 *  hnsecPer!TUnit.minute;
> >     else static if(un ==  TUnit.day)
> >         enum hnsecPer = 24 *  hnsecPer!TUnit.hour;
> >     else static if(un ==  TUnit.week)
> >         enum hnsecPer = 7 *  hnsecPer!TUnit.day;
> >     else static assert(0);
> >  }
> > 
> > ...
> > 
> >     static long to(TUnit  tuFrom, TUnit tuTo)(long value) pure nothrow
> >          if(tuFrom >= TUnit.week && tuFrom <= TUnit.hnsec  &&
> >            tuTo >=  TUnit.week && tuTo <= TUnit.hnsec)
> >      {
> >         static if(tuFrom > tuTo)
> >              return value * (hnsecPer!tuTo /  hnsecPer!tuFrom);
> >         else
> >              return value / (hnsecPer!tuFrom /  hnsecPer!tuTo);
> >     }
> > 
> > (Note --  untested)
> 
> I'm very torn on the ordering of time units when it matters  because years are 

> larger units than months, etc. but they have a smaller  resolution. So, for 
> instance, maxResolution() on Date returns days, while  minResolution() returns 

> years. And if you treat years as the largest unit,  then the units returned 
>from 
>
> maxResolution() are actually less than the ones  returned from 
>minResolution()...

I skipped over maxResolution and minResolution.  But I thought everything in the 
conversion functions used longs?

Oh wait, this is in response to the "reorder the enums" comment, ok.

But now that I'm looking at it, minResolution makes little sense.  It only makes 
sense about crossing the month/year boundary, but this is well defined by 
functions that require it.  In other words, when would minResolution matter?

> 
> As for rewriting the conversions, I've  pretty much just taken the bodies of 
>the 
>
> other conversion functions that  to!() was using and put them directly in to!() 
>
> (and renamed it to convert!()  at Andrei's request).

The template hnsecsPer I defined above can have many uses throughout the 
date/time code, and in other code as well.  I think if you consistently based 
things off of something like that, and consistently used TUnit as a template 
parameter, the code gets much smaller.  Look at my conversion function, it's 4 
lines of code, and handles every possible conversion besides months (with 
maximum resolution).

> 
> > 
> > ----
> > 
> >  toString!(TUnit) -- this seems like an extremely fringe need.  Not often  
do
> > I care about printing a value representing seconds.  More often  I care
> > about printing a duration, time of day, or a date.  Can we  drop this and
> > functions that depend on it?
> > 
> > Also note  that string representation of date/time is one of those things
> > that is  highly sensitive to locale.  Tango has a ginormous library
> >  dedicated to printing locale-dependent stuff including date/time.
> > 
> > I agree that having a default print for date/time is fine, esp.  for
> > debugging, but let's not try to reinvent formatted printing in the  date
> > time module.
> 
> Most of the stuff like toString!(TUnit) is  helper code for the few functions 
> which actually print them out as strings.  It wouldn't really hurt to make them 
>
> private, thereby restricting whatever  locale issues they present to datetime 
> itself.
> 
> As for printing  functions, what I did follows Boost. The various time point 
> types have  toISOString(), toISOExtendedString(), and toSimpleString(). The 
>only 
>
> one of  the three which would care about locale is toSimpleString() since it 
>uses 
>
> an  abbreviation for the month in it.
> 
> Other than that, I believe that the  only locale-specific stuff is for printing 
>out 
>
> units of time which is done  primarily (maybe even solely) in Duration. Very 
> little is actually  locale-specific. It's primarily ISO stuff which ignores 
> locales. As such, we  could just stick to English for the little that has 
>locale-
> specific stuff.  For instance, I wouldn't expect printing a Duration to really 
>need 
>
> to be  locale-specific. I would expect it to be primarily for debugging. 
> toSimpleString() (and its corresponding fromSimpleString()) would matter a  bit 
>
> more, but I'd expect code that really cared about intercommunicating  with 
>other 
>
> stuff would use the ISO or ISO Extended strings. However, the  simple strings 
>do 
>
> have the virtue of being somewhat more legible, so I don't  think that I'd want 
>
> to get rid of them.

I'm looking at things like UseLongName and all those other specific options that 
really belong in formatting code.  I'd say don't print text versions of 
anything, just print numerical representation in a standard format.  Libs 
outside of datetime will handle text representation, specific to the locale of 
the user.  Trimming out all this string printing stuff will save a lot on code 
size.

> 
> > 
> > ----
> > 
> > timeT2StdTime and stdTime2TimeT -- I really don't like the names  here.  Can
> > we call it C Time?  T has such a known usage as  representing a generic
> > type, this was my immediate thought of what it  does.  In Tango, we called
> > it toUnixTime and fromUnixTime.   Also, don't abbreviate to as 2.  I hate
> > that :)
> 
> I believe  that I used the 2 because all of the t's that were already there 
>made 
>
> using  To harder to read. It's not as bad if you use the term unixTime  
though.
> 
> > Also, I don't think we need another version of this
> >  (fromTimeTEpoch2StdTimeEpoch), if you have the wrong unit, just use the  
to!
> > functions defined above.
> 
> Hmm. The problem is that  stdTime2TimeT() and its reverse are not only 
>converting 
>
> between epochs but  also (at least potentially) converting between types 
> (depending on the size  of time_t on the system in question) as well as the 
>unit 
>
> type, and  stdTime2TimeT() and its reverse have special code for handling the 
> fact that  time_t and long aren't necessarily the same size, so I'm not sure 
>how 
>
> you  could really combine them with fromTimeTEpoch2StdTime() and its reverse. 
> They do similar but different things.

// pseudocode
immutable UnixEpoch = StdTime(1/1/1970);

// if you want to convert to unix epoch time, use:
x - UnixEpoch;

If you want to convert from a time_t, use the defined special function.  Note, I 
resisted in Tango for a long time defining the to/fromUnixTime functions, but 
people complained constantly -- it will definitely be a used feature :)  But 
nobody asked for converting epochs.

> > I'll echo what I've read  from Andrei -- I don't like using classes as
> > namespaces.  Find  another way.
> 
> At the moment, it's down to just Clock and IRange. And  personally, I think 
>that 
>
> they really improve code legibility, so for the  moment, I'm leaving them in. 
>If 
>
> when it finally comes down to it, Andrei  absolutely insists that they go, then 
>
> I'll obviously have to get rid of  them, but personally, I think that they 
> definitely make the code that uses  them easier to understand (especially for 
> IRange).

I'll wait for your updated code before passing judgment ;)

> > General nitpick comment, your ddoc is over  80 chars wide (over 100 in some
> > spots), can you fix this?  I don't  mind code being wider than necessary,
> > but comments should fit within an  80xN terminal.
> 
> I may take the time to fix it, but I generally find trying  to restrict stuff 
>to 80 
>
> characters to be highly annoying, and I won't even  consider doing it for code 
>- 
>
> that definitely harms code readabliity. The  only use case that makes any sense 
>to 
>
> me to really care about is for if  you're trying to print out code on paper, 
> since (unless you're printing  landscape) the number of columns on paper is 
> pretty limited.
> 
> How many  people even use 80xN terminals? There are so many better options. So, 
>I 
>
> may  or may not take the time to fix the ddoc comments to fit in 80 characters, 
>but 
>
> I don't see much point, and I have enough else to do that I may not get  around 
>
> to it.

At least one :)  I usually do d development by starting an xterm and loading 
vim.

It's a simple few keystrokes in my vim to do it (per paragraph), if you want I 
can reformat them once you are done.  I have certain OCD issues, and this is one 
of them :)

> 
> > In e.g. HNSecDuration.msecs, you are using a very  convoluted way to get the
> > number of milliseconds :)  Use mod  instead.
> 
> Hmmm. That convulated way is pretty much necessary for larger  units, but I 
>guess 
>
> that it wouldn't be for msecs, usecs, or hnsecs.  Actually, I'm halfway tempted 
>
> to make it use FracSec instead, since  splitting out the msecs, usecs, and 
>hnsecs 
>
> doesn't make sense in the same  way that days, hours, minutes, etc. do. You're 

> really dealing with the  fractional seconds beyond the second at different 
>levels 
>
> of precision, not  full-on separate units.

Here's what I mean.  Change:

    @property long msecs() const pure nothrow
    {
        auto hnsecs = removeUnitsFromHNSecs!(TUnit.week)(_hnsecs);
        hnsecs = removeUnitsFromHNSecs!(TUnit.day)(hnsecs);
        hnsecs = removeUnitsFromHNSecs!(TUnit.hour)(hnsecs);
        hnsecs = removeUnitsFromHNSecs!(TUnit.minute)(hnsecs);
        hnsecs = removeUnitsFromHNSecs!(TUnit.second)(hnsecs);

        return getUnitsFromHNSecs!(TUnit.msec)(hnsecs);
    }

to
    @property long msecs() const pure nothrow
    {
        return (_hnsecs % hnsecsPer!(TUnit.second)) / hnsecsPer!(TUnit.msec);
    }

Note, TUnit.seconds could be replaced with (TUnit.msec + 1) or whatever, and 
then you have a pretty well defined pattern that can be changed into a template.

Hey look, there's that cool hnsecsPer template again :)

> 
> > 
> > And in general, can we just use a  template?  If we continuously
> > parameterize everything based on the  unit type, generic programming is
> > going to be much easier.
> > 
> > i.e.
> > 
> > @property long get(TUnit unit type)()  {...}
> > 
> > alias get!(TUnit.msec) msecs;
> > ...
> 
> That's  going to make for an annoying large number of static ifs, but it 
>probably 
>
> should be done.

No static ifs necessary if you use hnsecsPer.

> > JointDuration -- aside from operators, do we need to  wrap the other methods
> > of HNSecDuration and MonthDuration?  Can we  just provide accessors for the
> > MonthDuration and HNSecDuration (in fact,  you may want this).
> 
> Well, sadly, I removed MonthDuration and  JointDuration, so that's no longer an 
>
> issue.

Understood, I responded to your first message without reading most of the 
responses (I had about 100 unread phobos messages to go through, and I wanted to 
express my opinions without reading all of them first).

> > TimeOfDay: Can we  use HNSecDuration with an invariant that it's < 24 hours?
> >  I  can't see why you'd want to reimplement all this.  FWIW, Tango uses
> >  Span (the duration type) as it's timeofday component.
> > 
> > Date is  one thing -- the durations are based on a point in time.  But time
> >  of day is always the same no matter the day.
> 
> Hmm. I didn't think of that.  It doesn't quite work though. Aside from the fact 
>
> that that would take more  memory (for better or for worse), stuff like 
> rollHours() wouldn't work if  you just used an HNSecDuration, and if you're 
> wrapping an HNSecDuration, you  have to do a lot more calculations for that 
>sort 
>
> of code. I think that the  result is a net-loss, personally.

More memory for more resolution and less types to deal with (no FracSec)

But let's calculate it out, there are 10 * 1000 * 1000 * 60 * 60 * 24 hnsecs in 
a day.  That comes out to require 5 bytes of storage (max value 864 billion).  
In the grand scheme of things 3 wasted bytes aren't that bad.

as for rollHours, I skipped over that as well.  Let me read it...

Ah ok, here's a complete implementation (with hnsecsPer):

long roll!(TUnit un)(long orig, long nOfUnit)
{
   auto x = orig / hnsecsPer!(un - 1) * hnsecsPer!(un - 1);
   return (orig + nOfUnit * hnsecsPer!(un)) % hnsecsPer!(un - 1) + x; 
}

and accompany this with the ensuing aliases e.g.:

alias roll!(TUnit.days) rollDays;

> 
> > 
> > I'd expect the following  structs in timepoint.d:
> > 
> > Date -- a date with the fields year  month day
> > Time -- A HNSecDuration with the limitation that it must be  less than  24
> > hours DateTime -- a combination of both Date and  Time
> > 
> > As far as time zone, I've not yet dealt with it.  It  was on my plate to add
> > time zones to Tango, but I never got around to  it.  I think a type that
> > combines a point in time with a time zone  might be the best solution,
> > similar to how an interval combines a point  in time with a duration.
> 
> SysTime combines the time in hnsecs from  midnight January 1st, 1 AD UTC with a 
>
> time zone object.
> 
> > 
> >  You have other time types in timepoint which I think are not necessary. 
> >  Like FracSec.
> 
> I think that it's a very good idea to have FracSec. The  reason is that when 
> you're dealing with the units smaller than a second, it  doesn't really make 
>much 
>
> sense to treat them individually anymore. They're  just differing 
> precisions/resolutions for/of the some thing - the fractional  seconds. So, I 
> think that it's clearer and cleaner with FracSec.

But we already have that in HNSecDuration.

> 
> >  OK, so that's what I have.  I think it's a very well thought out lib,  it
> > just needs to be trimmed down.
> > 
> > One final thought --  after reading through all the stuff, unit tests take
> > up the vast bulk of  the lines of code.  I think it's safe to say the .di
> > file would be  like 5000 LOC.  I think it definitely should be one file.
> 
> I'll be  turning it into a single .di file / .d file pair, and I concur that 
>the 
>
> .di  file will not be particularly large, but if all the ddoc comments are in 
> there, I have no idea how large that would make it. We'll see. But there's  no 

> question that the unit tests take up the most space (and they've been a  life-
> saver too; much of the code would likely be broken in a lot of subtle  ways if 
>I 
>
> didn't have them), and that won't cost in space won't transfer  over to the .di 
>
> file.

I'd say just the d file please.  The only point of having a .di file is to hide 
implementation.  There are almost no .di files in phobos/druntime except where 
it makes sense (mostly druntime).  And there is only one manually maintained .di 
file -- object.di.

-Steve



      


More information about the phobos mailing list