The Trouble with MonoTimeImpl (including at least one bug)
Jonathan M Davis
newsgroup.d at jmdavisprog.com
Wed Apr 3 00:09:25 UTC 2024
On Tuesday, April 2, 2024 12:35:24 PM MDT Forest via Digitalmars-d wrote:
> I'm working on code that needs to know not only how much time has
> elapsed between two events, but also the granularity of the
> timestamp counter. (It's networking code, but I think granularity
> can be important in multimedia stream synchronization and other
> areas as well. I would expect it to matter in many of the places
> where using MonoTimeImpl.ticks() makes sense.)
>
> For clarity, I will use "units" to mean the counter's integer
> value, and "steps" to mean the regular increases in that value.
>
> POSIX exposes counter granularity as nanoseconds-per-step via
> clock_getres(), and MonoTimeImpl exposes its reciprocal (1/n) via
> ticksPerSecond(). I'm developing on linux, so this appeared at
> first to be sufficient for my needs.
>
> However, I discovered later that ticksPerSecond() doesn't return
> counter granularity on Windows or Darwin. On those platforms, it
> returns units-per-second instead: the precision of one unit,
> rather than one step. This is problematic because:
>
> - The function returns conceptually different information
> depending on the platform.
> - The API offers the needed granularity information on only one
> platform.
> - The API is confusing, by using the word "ticks" for two
> different concepts.
>
>
> I think this has gone unnoticed due to a combination of factors:
>
> - Most programs do only simple timing calculations that don't
> include a granularity term.
>
> - There happens to be a 1:1 unit:step ratio in some common cases,
> such as on my linux box when using MonoTime's ClockType.normal.
>
> - On Windows and Darwin, MonoTimeImpl uses the same fine-grained
> source clock regardless of what ClockType is selected. It's
> possible that these clocks have a 1:1 unit:step ratio as well.
> (Unconfirmed; I don't have a test environment for these
> platforms, and I haven't found a definitive statement in their
> docs.)
>
> - On POSIX, although selecting ClockType.coarse should reveal the
> problem, it turns out that ticksPerSecond() has a special case
> when clock steps are >= 1us, that silently discards the
> platform's clock_getres() result and uses a hard-coded value
> instead. (Bug #24446.) That value happens to yield a 1:1
> unit:step ratio, hiding the problem.
>
>
> Potential fixes/improvements:
>
> 1. Give MonoTimeImpl separate functions for reporting
> units-per-second and steps-per-second (or some other
> representation of counter granularity, like units-per-step) on
> all platforms.
>
> 2. Remove the special case described in bug #24446. I suspect the
> author used that hard-coded value not because clock_getres() ever
> returned wrong data, but instead because they misunderstood what
> clock_getres() does. (Or if *I* have misunderstood it, please
> enlighten me.)
>
> 3. Implement ClockType.coarse with an actually-coarse clock on
> all platforms that have one. This wouldn't solve the above
> problems, but it would give programmers access to a presumably
> more efficient clock and would allow them to avoid Apple's extra
> scrutiny/hoops for use of a clock that can fingerprint devices.
> https://developer.apple.com/documentation/kernel/1462446-mach_absolute_time
>
>
> Open questions for people who use Win32 or Darwin:
>
> Does Win32 have an API to get the granularity (units-per-step or
> steps-per-second) of QueryPerformanceCounter()?
>
> Does Darwin have such an API for mach_absolute_time()?
>
> If the unit:step ratio of the Win32 or Darwin clocks are always
> 1:1, is that clearly documented somewhere official?
>
> Do either of those platforms offer a coarse monotonic clock?
Well, it would appear that the core problem here is that you're trying to
use MonoTime in a way that was not designed for or was even thought of when
it was written.
For some history here, previously, we had TickDuration, which was used both
as a point in time / timestamp of the monotonic clock and as a duration in
ticks of the monotonic clock, so it conflated the two, making it potentially
quite confusing to use (and making it so that the type system couldn't
differentiate between a timestamp of the monotonic clock and a duration in
ticks of the monotonic clock). MonoTime was written to replace it and to be
only a timestamp, with Duration being left as the only way to represent
durations of time (or at least, the only way to represent durations of time
with units).
Originally, MonoTime simply used the normal monotonic clock on the system.
There was no way to configure it. However, when std.logger was originally
being written, it was using MonoTime fairly heavily (I don't know what it
does now), and they were finding that it was too much of a performance hit
to be getting the monotonic time as frequently as they were. So, to improve
the situation, I created MonoTimeImpl as a type which was templated on the
type of the monotonic clock, and I made MonoTime an alias to that which used
the normal monotonic clock. That way, the logger stuff could use the coarse
clock on POSIX systems (other than Mac OS X), and since I was templating it
on the clock type, I added support for several of the clock types available
on POSIX systems. For simplicity, I also made it so that systems which
didn't actually support a coarse clock would just use the normal monotonic
clock so that stuff like std.logger wouldn't have to worry about whether a
particular system actually supported the coarse clock. It would just get the
coarse clock if it was available and get the normal clock otherwise. But I
was not aware of either Windows or Mac OS X having alternative monotonic
clocks, so they got none of that support. They just get the normal clock,
and trying to use the coarse clock on those systems gives you the normal
clock.
But the basic design of MonoTime did not change as part of this. The other
clocks were really just added for performance reasons. And as far as using
MonoTime/MonoTimeImpl goes, it was intended that the type of the clock be
irrelevant to its usage.
As for the basic design behind MonoTime, the idea was that you were going to
do something like
immutable start = MonoTime.currTime();
...
immutable end = MonoTime.currTime();
immutable timeElapsed = end - start;
That's it. It was never intended to tell you _anything_ about the monotonic
clock. It was just for getting the time from the monotime clock so that you
could determine how much time had elapsed when you subtracted two monotonic
times.
The only reason that ticks and ticksPerSecond were exposed was because
Duration uses hecto-nanoseconds, whereas the monotonic clock will be more
precise than that on some systems, so ticks and ticksPerSecond were provided
so that anyone who needed units more precise than hecto-nanoseconds could do
the math themselves. So, while ideally, ticksPerSecond corresponds to the
resolution of the system clock, it was never intended as a way to decide
anything about the system clock. It's there purely so that the appropriate
math can be done on the difference between two different values for ticks,
and it never occurred to me that anyone might try to do anything else with
it - if nothing else, because I've never encountered code that cared about
anything about the monotonic time other than using it to get the difference
between two points in time without worrying about the system clock changing
on you like can happen with the real time clock.
In principle, ticks is supposed to be a number from the system clock
representing the number of ticks of the system clock, with the exact number
and its meaning being system-dependent. How often the number that you can
get from the system is updated was not considered at all relevant to the
design, and it never occured to me to call the number anything other than
ticks, because in principle, it represents the current tick of the system
clock when the time was queried. Either way, it's purely a monotonic
timestamp. It's not intended to tell you anything about how often it's
updated by the system.
ticksPerSecond is then how many of those ticks of the system clock there are
per second so that we can do the math to figure out the actual duration of
time between two ticks of the system clock.
Windows and Mac OS X both provide what is basically the correct API for
that. They have a function which gives you the current tick of the system
clock, and they have a function which tells you what you need to know to
know how many ticks there are per second (though both of them provide that
in a slightly odd way instead of as just an integer value).
Unfortunately, other POSIX systems don't have functions that work that way.
Instead, they reused the same function and data type that's used with the
real-time clock but changed one of the arguments to let you tell it to use a
different clock. So, instead of getting the ticks of the system clock, you
get the duration in nanoseconds. So, to make that fit the model of ticks and
ticks-per-second, we have to convert that to ticks. For the purposes of
MonoTime and how it was designed to be used, we could have just always made
it nanoseconds and assumed nanosecond resolution for the system clock, but
instead, it attempts to get the actual resolution of the system's monotonic
clock and convert the nanoseconds back to that. Obviously, that's also
better for anyone who wants to do something with the actual clock
resolution, but really, it was done just to try to make the implementation
consistent with what happens on other platforms. It never occured to me that
anyone would even be trying to query the clock resolution to do something
with that information besides what was necessary to convert a duration in
ticks to a duration in seconds or fractional seconds.
And honestly, if it had occurred to me that anyone would ever consider using
ticksPerSecond for anything other than doing math on the difference between
two values of ticks, I probably would have provided a function for giving
the difference between two ticks in nanoseconds rather than exposing either
ticks or ticksPerSecond, since then everything about how that's calculated
or stored could be considered to be entirely an implementation detail,
particularly since trying to expose information about the system clock gets
kind of hairy given that the APIs on each OS are completely different. And
that way, all of the issues that you're dealing with right now wouldn't have
even come up. You may have then created a feature request to get what you
were looking for, but at least you wouldn't have been confused about what
MonoTime was doing.
As for the oddities with how ticksPerSecond is set when a weird value is
detected, I did some digging, and it comes from two separate issues.
The first - https://issues.dlang.org/show_bug.cgi?id=16797 - has to do with
how apparently on some Linux systems (CentOS 6.4/7 were mentioned as
specific cases), clock_getres will report 0 for some reason, which was
leading to a division by zero. So, to make that work properly, it was set to
nanosecond precision just like in the case that already existed.
As for the other case, it looks like that was an issue that predates
MonoTime and was originally fixed in TickDuration. The original PR was
https://github.com/dlang/druntime/pull/88
The developer who created that PR reported that with clock_getres, some
Linux kernels were giving a bogus value that was close to 1 millisecond when
he had determined that (on his system at least) the actual resolution was 1
nanosecond. And while it's not exactly ideal to just assume that the
resolution is 1 nanosecond in that case, it works perfectly well, because on
those systems, the way that the monotonic time is actually reported is in
nanoseconds. So, to fix those systems, that's what we did. And naturally,
that fix made it into MonoTime when it was created so that MonoTime would
also work correctly on those systems.
I don't know if either of those fixes is still required, but neither of them
violates the intended use case for MonoTime, and since it never occurred to
me that someone would try to use ticksPerSecond for anything else other than
doing some math on ticks, I thought that it was acceptable to have
ticksPerSecond be nanosecond resolution in those cases even if it wasn't
ideal. And the exact resolution at which MonoTime gives up and decides to
just use nanosecond resolution doesn't matter all that much in that case
either, because the math is still all going to come out correctly. And at
the time it was written, the coarse clock didn't even enter into the
equation, so realistically, the only affected systems were ones that were
reporting bizarrely low clock resolutions with clock_getres.
Now, obviously, if you're looking to get the actual resolution of the
monotonic clock, the current situation is going to cause problems on any
system where clock_getres reports a value that causes MonoTime to decide that
it's invalid and that it needs to use nanoseconds instead. So, the question
becomes what we do about this situation.
As for your discussion on units vs steps, I have no clue what to do with
that and any API. On Windows and Mac OS X, we get a number from the system
clock with no information about how often that's updated - either internally
or with regards to the number that we get when making the function call.
Because they're ticks of the monontonic clock, I called them ticks. But how
they're incremented is an implementation detail of the system, and the
documentation doesn't say. All I know is that they're incremented
monotonically.
So, I guess that what I'm calling ticks, you would call units (Mac OS X
appears to call them tick units -
https://developer.apple.com/documentation/kernel/1462446-mach_absolute_time),
but they're what the system provides, and I'm not aware of anything from
those APIs which would allow me to provide any information about what you're
calling steps. Those function calls update at whatever rate they update, and
without something in the documentation or some other function which provides
more information, I don't see how to provide an API that would give that
information.
On Linux and the BSDs, the situation isn't really any better. Instead of
getting the "tick units", we get nanoseconds. We can then use clock_getres
to get the resolution of a given system clock, which then indicates how many
units there are per second, but it doesn't say anything about steps. Though
looking over your post again, it sounds like maybe you interpret the clock
resolution as being steps? I've always taken it to mean the resolution of
the actual clock, not how often the value from clock_gettime would be
updated, which if I understand what you're calling units and what you're
calling steps correctly would mean that clock_getres was giving the number
of units per second, not the number of steps per second.
But even if clock_getres actually returns steps, if there is no equivalent
on Windows and Mac OS X, then it's not possible to provide a cross-platform
solution.
So, I don't see how MonoTime could provide what you're looking for here.
Honestly, at this point, I'm inclined to just make it so that ticksPerSecond
is always nanoseconds on POSIX systems other than Mac OS X. That way, we're
not doing some math that is pointless if all we're trying to do is get
monotonic timestamps and subtract them. It should also improve performance
slightly if we're not doing that math. And of course, it completely avoids
any of the issues surrounding clock_res reporting bad values of any kind on
any system. Obviously, either way, I need to improve the documentation on
ticks and ticksPerSecond, because they're causing confusion.
If we can then add functions of some kind that give you the additional
information that you're looking for, then we can look at adding them. But
all that MonoTime was designed to deal with was what you're calling units,
and I don't see how we can add anything about steps, because I don't know
how we'd get that information from the system.
As for adding coarse clocks on other systems, if there are coarse clocks on
other systems, then I'm all for adding them. But I'm not aware of any such
clocks on either Windows or Mac OS X. That's why they're not part of
MonoTime.
- Jonathan M Davis
More information about the Digitalmars-d
mailing list