The Trouble with MonoTimeImpl (including at least one bug)

Tue Apr 9 19:22:47 UTC 2024

On Wednesday, 3 April 2024 at 00:09:25 UTC, Jonathan M Davis 
wrote:

> Well, it would appear that the core problem here is that you're 
> trying to use MonoTime in a way that was not designed for or 
> was even thought of when it was written.

I believe you. I think what I've been trying is reasonable, 
though, given that the docs and API are unclear and the source 
code calls a platform API that suggests I've been doing it right. 
Maybe this conversation can lead to improvements for future users.

> In principle, ticks is supposed to be a number from the system 
> clock representing the number of ticks of the system clock, 
> with the exact number and its meaning being system-dependent. 
> How often the number that you can get from the system is 
> updated was not considered at all relevant to the design, and 
> it never occured to me to call the number anything other than 
> ticks, because in principle, it represents the current tick of 
> the system clock when the time was queried. Either way, it's 
> purely a monotonic timestamp. It's not intended to tell you 
> anything about how often it's updated by the system.

The use of "ticks" has been throwing me. I am familiar with at 
least two common senses of the word:

1. A clock's basic step forward, as happens when a mechanical 
clock makes a tick sound. It might take a fraction of a second, 
or a whole second, or even multiple seconds. This determines 
clock resolution.

2. One unit in a timestamp, which determines timestamp 
resolution. On some clocks, this is the same as the first sense, 
but not on others.

 From what you wrote above, I *think* you've generally been using 
"ticks" in the second sense. Is that right? [Spoiler: Yes, as 
stated toward the end of your response.]

If so, and if the API's use of "ticks" is intended to be that as 
well, then I don't see why ticksPerSecond() is calling 
clock_getres(), which measures "ticks" in the first sense of the 
word. (This is my reading of the glibc man page, and is confirmed 
in the test program I wrote for issue #24446.)

> Unfortunately, other POSIX systems don't have functions that 
> work that way. [...] So, instead of getting the ticks of the 
> system clock, you get the duration in nanoseconds. So, to make 
> that fit the model of ticks and ticks-per-second, we have to 
> convert that to ticks. For the purposes of MonoTime and how it 
> was designed to be used, we could have just always made it 
> nanoseconds and assumed nanosecond resolution for the system 
> clock, but instead, it attempts to get the actual resolution of 
> the system's monotonic clock and convert the nanoseconds back 
> to that.

Ah, so it turns out MonoTime is trying to represent "ticks" in 
the first sense (clock steps / clock resolution). That explains 
the use of clock_getres(), but it's another source of confusion, 
both because the API doesn't include anything to make that 
conversion useful, and because ticksPerSecond() has that 
hard-coded value that sometimes renders the conversion incorrect 
(issue #24446).

> As for the oddities with how ticksPerSecond is set when a weird 
> value is detected, I did some digging, and it comes from two 
> separate issues.
> 
> The first - https://issues.dlang.org/show_bug.cgi?id=16797 - 
> has to do with how apparently on some Linux systems (CentOS 
> 6.4/7 were mentioned as specific cases), clock_getres will 
> report 0 for some reason, which was leading to a division by 
> zero.

Curious. The clock flagged in that bug report is 
CLOCK_MONOTONIC_RAW, which I have never used. I wonder, could 
clock_getres() have been reporting 0 because that clock's 
resolution was finer than that of the result type? Or, could the 
platform have been determining the result by sampling the clock 
at two points in time so close together that they landed within 
the same clock step, thereby yielding a difference of 0? In 
either case, perhaps the platform code has been updated since 
that 2013 CentOS release; it reported 1 when I tried it on my 
Debian system today.

> As for the other case, it looks like that was an issue that 
> predates MonoTime and was originally fixed in TickDuration. The 
> original PR was
>
> https://github.com/dlang/druntime/pull/88
>
> The developer who created that PR reported that with 
> clock_getres, some Linux kernels were giving a bogus value that 
> was close to 1 millisecond when he had determined that (on his 
> system at least) the actual resolution was 1 nanosecond.

I disagree with that developer's reasoning. Why should our 
standard library override a value reported by the system, even if 
the value was surprising? If the system was reporting 1 
millisecond for a good reason, I would want my code to use that 
value. If it was a system bug, I would want it confirmed by the 
system maintainers before meddling with it, and even then, I 
would want any workaround to be in my application, not library 
code where the fake value would persist long after the system bug 
was fixed.

> Honestly, at this point, I'm inclined to just make it so that 
> ticksPerSecond is always nanoseconds on POSIX systems other 
> than Mac OS X. That way, we're not doing some math that is 
> pointless if all we're trying to do is get monotonic timestamps 
> and subtract them. It should also improve performance slightly 
> if we're not doing that math.

I think that makes sense. The POSIX API's units are defined as 
nanoseconds, after all, so treating them as such would make the 
code both correct and easier to follow.

If we eventually discover APIs or docs on the other platforms 
that give us clock resolution (in either timestamp units or 
fractions of a second) like clock_getres() does on POSIX, then it 
could be exposed through a separate MonoTime method.

> If we can then add functions of some kind that give you the 
> additional information that you're looking for, then we can 
> look at adding them.

Yes, we're thinking along the same lines. :)

Thanks for the thoughtful response.

Forest