/home/shalen

GNSS and time

Published: 2023-03

I recently had the pleasure™ of dealing with time, in particular with how it is represented inside computers and inside the GNSS (Global Navigation Satellite System). What initially seemed a reasonably straightforward problem ended up in a surprisingly deep rabbit hole involving atomic clocks, leap seconds, a metrology and computer-history lesson focusing on the narrow period between 1970 and 1972 and a short dive into the Linux kernel code.

This post is the summary of what I found while researching this topic.

Disclaimer

This topic turned out to be rather intricate and figuring out the "technically correct" answer became a personal challenge (which, as a friend of mine correctly pointed out, is a polite way of describing an obsession).

Luckly, 90% of the details reported below are irrelevant unless you need to handle timestamps before 1972 with second-level (or higher) precision. This blog post assumes that you are a normal human being that only needs to convert UTC timestamps to UNIX timestamps/GPS timestamps after 1972, and therefore presents a simplified model of reality, correct enough for the assumed use case.

Isn't it fun what programmers consider a "normal human being".

Whenever this blog post makes a technically incorrect but practically correct enough statement, that statement will end with an asterisk (*). At the end of the blog post there's a section that describes what actually happened to time between 1970 and 1972.

Background: representing time

How can time be expressed as bits of information that can be processed by a computer?

First of all, we need to differentiate between two concepts: "time point" and "duration". A time point (often called "timestamp") is, as the name suggets, a precise point in time (e.g. the "1st of October 2004 at 14:32:13 in the Europe/Rome timezone"). On the other hand, a duration represents an extent of time (e.g. "5 seconds"). These two concepts can also be represented on a timeline:

An image of a time line showing the difference between time point and duration

These two concepts can also be combined in a somewhat intuitive "time arithmetic":

time point + time point = undefined
time point + duration = time point
time point - time point = duration
duration ± duration = duration

Durations are relatively easy to represent in a computer: you can use a normal numerical variable and by convention say that it represents a duration of a certain number of (e.g.) milliseconds. Durations behave like other scalar quantities (length, weigth, ...) and can be managed as such. The only part that requires some attention is when mixing durations using different SI ratios, for example when summing a duration in milliseconds with one in seconds.

On the other side, time points are more difficult. An intuitive approach could be to use the same representation commonly used by humans, which in this post I will call the "wall-clock" representation. An example of such representation could be 21/04/1975 12:17:00. The computer could store each piece independently (one variable for the day number, one for the month number, ...) and then combine them together when they have to be shown to the user. This approach is the one used by the C's tm struct (docs) and by Javascript's Date (docs).

This approach has the clear advantage of being easy for humans to understand, but it has several shortcomings:

  1. It's not clear what timezone is being used. The same representation could map to different points on the time line depending on whether it is expressed in one timezone or another. And timezones are a source of a lot of headaches.
  2. A basic implementation of the wall-clock representation makes it trivial to represent invalid time points. For example, assuming that a normal integer variable is used internally to represent the number of days, one could set its value to 16245, which in most calendars is not a valid day number. Even assuming that we can afford the performance penalty required to ensure that only valid value are used, implementing such checks is definitely not trivial (if you are curious about why watch this or look up what happened on the 5th of October 1582).
  3. A basic implementation of the wall-clock representation is quite wasteful memory-wise, requiring around six integer variables to represent a single time point.

Luckly, we can do better. Nowadays, most computer systems have converged on a representation based on an "epoch" (a precise time point taken as a reference) and a duration of time elapsed from that epoch. As far as this post is concerned, each epoch defines a new "time reference system", which in this post will be called "clock" (for analogy with the std::chrono C++ library).

As a mental model, each "clock" can be tough of as a way of giving a different "label" to a particular instant on the time line.

For example, let's consider the POSIX clock (often called UNIX time). It defines its epoch to be the 01-01-1970 00:00:00 UTC, which precisely identifies a single time point (*). Once the epoch is fixed, a POSIX time point can be expressed as a duration (e.g. a number of seconds) elapsed before or after that epoch. For example, a POSIX time point of 15 seconds would represent the time point 01-01-1970 00:00:15 UTC (*).

Technically, from what I understand, the POSIX standard only talks about seconds after the POSIX epoch, but the concept can be easily extended to arbitrary durations (milliseconds, nanoseconds, ...) after or before the epoch. A negative duration would express a time point before the epoch.

This representation, although less amenable to humans, counters all the disadvantages of the wall-clock representation. Timezones are not a problem, as the epochs are usually defined using UTC time points. Moreover, invalid dates are not representable, as any duration before/after the epoch is by definition a valid date. Finally, each time point can be represented by a single value (e.g. the number of milliseconds after the epoch), leading to a way more compact representation.

As an additional bonus, computing how much time is elapsed between two time points expressed as epoch+duration is efficient, as it involves a single difference (assimung they share the same epoch). With the wall-clock representation this computation is significantly more complicated.

Background: leap seconds

Until here, we have conveniently ignored a piece of the puzzle: leap seconds.

Understanding leap seconds requires undestanding the various ways that humans have figured out to decide how long a second is. Historically, the length of a second has been determined from astronomical references. Using modern lexicon, this way of determining the length of a second is called UT1, and has two big drawbacks: astronomical references are difficult to measure precisely and, more importantly, the duration of a UT1 second is tied to Earth's rotational speed. This would not be a huge problem if it wasn't for the fact that Earth's rotational speed is not constant, and tends to progressively slow down over time.

This is a simplification, and a detailed explanation of UT1 and UT0 is beyond the scope of this post.

As the required precision for time keeping grew, the variability in length of a UT1 second started being a problem. This led to the development of a new definition of the second, based on precisely measurable and, more importantly, unchanging atomic properties. This new definition of the second resulted in the creation of the Internation Atomic Time (TAI for short), which is independent from Earth's rotational speed and the messiness of astronomical references. This new time scale soon became the "default" for scientific applications.

Given the differences in their definitions, these TAI and UT1 time scales get progressively "out of sync" with each other, because UT1 progressively falls behind TAI due to Earth's slowing rotation. By itself this would not be a problem, but it becomes one if we tried to use TAI as a "universal human time". Humans activities continue to follow the sun cycle, and would therefore like to remain in UT1. If TAI became the "universal human time", over centuries the difference between TAI and UT1 would start to become meaningful, and it could potentially end up in a situation in which a human clock would show a time of 14:00:00 (two o'clock in the afternoon, in TAI) but the sun would have already set (because it would be 21:00:00, nine o'clock in the evening, in UT1).

This is of course not desirable, and the "solution" was the creation of a new time scale, UTC (Coordinated Universal Time), which is designed to bridge the difference between TAI and UT1. The main idea is that one UTC second would always equal to one TAI second (*), but periodically an integer number of seconds would added (or potentially removed) from UTC so that the difference between UT1 and UTC remains below one seconds (*). In symbols:

\[ UTC = TAI + n \]

\[ \vert UT1 - UTC \vert < 0.9 \, seconds \]

Where \(n\) is the integer number of leap seconds (*).

The relationship between UT1 and UTC can also be seen in the following plot, where the vertical jumps show the addition of a leap second (source):

Plot representing UT1-UTC

The decision on when to add a leap seconds is handled by the IERS and the current leap seconds situation is available here.

The short summary presented in the previous section should cover everything that's needed for the rest of this post. If you want a more in-depth look at UT0/UT1/UT2 vs UTC vs TAI see [1].
Although designed as such, UTC and leap seconds mechanism are not universally appreciated as a solution. There are currently ongoing discussions that are considering to remove the leap seconds mechanism entirely, with the proposed last leap second being in 2035.

Background: the GNSS

The Global Navigation Satellite System (or GNSS for short) is a combination of several Earth-orbiting satellites that transmit positioning and timing data to GNSS receivers. These receivers then use this data to determine their own location on the Earth's surface. Every time you turn on the "geolocalization" feature of your smartphone, you're using the GNSS. Oftentimes, the term "GPS" is colloquially used to talk about the GNSS, but technically "GPS" is only the constellation of geolocalization satellites managed by the USA. Similarly, there are the Galileo constellation (Europe), the Glonass constellation (Russia) and the Beidou constellation (China).

The problem: GNSS clocks

In order to function properly, each satellite of each constellation needs to maintain a precise time. Moreover, each constellation also needs to find a way to represent this precise time inside the satellite's computer systems and inside the messages they are broadcasting. Luckly, most of the constellation considered in this post chose an epoch+duration representation. Unluckly, they all picked different epochs and there are also some differences on whether they take into account leap seconds.

Using the terminology introduced above, let's define the various clocks used by the GNSS [2]:

  1. GPS clock. Epoch is 05-01-1980 00:00:00 UTC. Leap seconds are ignored.
  2. Galileo clock. Epoch is 22-08-1999 00:00:00 UTC. Leap seconds are ignored.
  3. Beidou clock. Epoch is 01-01-2006 00:00:00 UTC. Leap seconds are ignored.
  4. Glonass clock. It's different from the others and is defined as UTC + 3 hours. Since it depends on UTC, it means that leap seconds are taken into account.

Software that works with GNSS data (like receivers and simulators) usually has to also have a way to represent a UTC time point. Often, that's handled through the POSIX clock. As stated above, it's epoch is 01-01-1970 00:00:00 UTC (*). Despite being commonly used to represent UTC time, it does not take into account leap seconds [3]. This means that it is not a particularly good way to represent UTC, but it is nevertheless used.

Here the "does not take into account leap seconds" warrants a longer explanation. The behaviour of other clocks (e.g. those based on TAI like the GPS clock) during a leap second is well defined and is intuitive: another second passes. The typical applications in which POSIX time points are used (web servers, database servers, servers in general) generally don't want to have to care too much about leap seconds, yet they still want to use POSIX time points to represent UTC times, because of the ease of using them in the various programming environments.

This, combined with the fact that a POSIX day must always have the same number of POSIX seconds (86400), lead to various "work arounds" which make the behaviour of a POSIX clock around a leap second implementation defined. Some commonly implemented behaviour are:

  1. Repeating the same POSIX time point during the leap second.
  2. Spreading the leap second by making many POSIX seconds around the leap second slightly longer than they should be (generally called "leap second smearing").

In order to better understand this, let's see what happens to UTC, TAI and the POSIX clock around a leap second. Let's consider the leap second added on the night between the 1st and the 2nd of January 1980:

UTC
(wall-clock)
TAI
(wall-clock)
Elapsed UTC seconds
(since POSIX epoch)
POSIX
(repeated)
POSIX
(smeared)
01 Jan 1980 23:59:58 02 Jan 1980 00:00:16 315619216 315619198 315619198
01 Jan 1980 23:59:59 02 Jan 1980 00:00:17 315619217 315619199
315619199
01 Jan 1980 23:59:60 02 Jan 1980 00:00:18 315619218 315619199
315619200
02 Jan 1980 00:00:00 02 Jan 1980 00:00:19 315619219 315619200

The first colum is straightforward and shows exactly what we would expect to see on a correct UTC clock: the day has an extra second (23:59:60) and that's the leap second.

The second colum shows that TAI does not handle the leap second in any particular way: it is just another elapsed second. Note that in the first line of the table, TAI is already 18 seconds ahead of UTC, because before this time point other 18 leap seconds have already been added to UTC. On the last line of the table, TAI is 19 seconds ahead of UTC, because a new leap second has elapsed.

The third column shows the number of seconds elapsed since 01-01-1970 00:00:00 UTC (*). This looks similar to a POSIX time point, but it is different, because leap seconds count as elapsed seconds.

The fourth and fifth column show two of the possible behaviours of the POSIX clock around a leap second. First of all, we can notice that in the first row there is an 18 seconds difference between the third and the fourth/fifth column. This is the same 18 second difference between the first and the second column and shows what it means that POSIX does not "take into account" leap seconds: although those 18 seconds are by all means elapsed seconds, POSIX time points just ignore them. We can also see that in the last row of the table this difference becomes of 19 seconds, because a new leap second has elapsed.

In the fourth column, the leap second is handled by repeating a POSIX time point twice. This means that a single POSIX time point actually maps to two points on the time line. In the fifth column, the leap second is smeared across multiple POSIX seconds. In this example, it is smeared over just three POSIX seconds for simplicity, but in reality this happens over many more POSIX seconds, so that the difference in the duration of each second is not so strong.

This quirk of the POSIX clock also means that it abuses the definition of "clock" that has been used through the article. To keep this in mind, for the rest of the blog post, let's differentate between the POSIX clock, which behaves as described in the fourth column of the table, from the UTC clock, which instead behaves as described in the third column of the table.

Clock conversions

A common necessity when writing software working with the GNSS is to convert a time point from one clock to another.

Let's assume that we are working with time points represented as epoch and duration. Moreover, let's use the \(t_{clock}\) notation to denote a time point \(t\), expressed as a duration (e.g. number of seconds) elapsed from the \(clock\)'s epoch. Let also \(n_t\) be the number of leap seconds at time \(t\).

Given the table above, and how the introduction of a leap second changes the relationship between the "Elapsed UTC seconds" and the "POSIX (repeated)" columns of the table, the conversion between UTC and POSIX clocks is as follows:

\[ t_{UTC} = t_{POSIX} + n_{t} \]

The next step is to be able to convert from our UTC clock to the GNSS clocks. We can do this by counting how many UTC seconds have elapsed between the POSIX epoch and the epochs of the various GNSS clocks.

Note that this number of elapsed seconds includes leap seconds, and is therefore different from the POSIX timestamp that represents the epoch of the GNSS clock we are considering.

For example, consider the GPS epoch, which has the wall-clock representation 05-01-1980 00:00:00 UTC. This time point corresponds to the POSIX timestamp 315964800 seconds, but has the timestamp 315964809 seconds in our UTC clock, because between 01-01-1970 00:00:00 UTC and 05-01-1980 00:00:00 UTC there have been 9 leap seconds (*). [4]

This reasoning can be done for each GNSS epoch and is summarized in the following image (not to scale):

An image showing the timeline with all the various clock epochs and their distance in elapsed seconds

This allows us to write the following conversions:

\[ t_{TAI} = t_{UTC} - 378691210 \, seconds \]

\[ t_{GPS} = t_{UTC} + 315964809 \, seconds \]

\[ t_{Galileo} = t_{GPS} + 619315200 \, seconds \]

\[ t_{Beidou} = t_{Galileo} + 200793600 \, seconds \]

In the image and in the previous equations the Glonass clock is missing. Given its definition relying on a constant offset of three hours from UTC, and that we are already "using" a UTC clock, conversion to it is straightforward:

\[ t_{Glonass} = t_{UTC} + 10800 \, seconds \]

This solves the original problem of converting between the various GNSS clocks, the POSIX clock and the UTC clock. This is also consistent with the classes system_clock, tai_clock, utc_clock and gps_clock from the C++'s std::chrono library (cppreference docs, although the Microsoft docs adds a few more background details).

In fact, as an example, we can take some inspiration from GCC's implementation of std::chrono and define an additional custom chrono clock that handles the Galileo clock and that can then be used as any other std::chrono clock (even with std::chrono::clock_cast):

#include <chrono>

template<typename Duration>
using utc_time = std::chrono::time_point<std::chrono::utc_clock, Duration>;

class galileo_clock {
  public:
    using rep = std::chrono::system_clock::rep;
    using period = std::chrono::system_clock::period;
    using duration = std::chrono::duration<rep, period>;
    using time_point = std::chrono::time_point<galileo_clock>;
    static constexpr bool is_steady = false;

    static time_point now() { return from_utc(std::chrono::utc_clock::now()); }

    template <typename Duration>
    static utc_time<std::common_type_t<Duration, std::chrono::seconds>>
    to_utc(const std::chrono::time_point<galileo_clock, Duration> &t){
      using CDur = std::common_type_t<Duration, std::chrono::seconds>;
      return utc_time<CDur> { t.time_since_epoch() }
        + std::chrono::seconds { 315964809 }
        + std::chrono::seconds { 619315200 };
    }

    template <typename Duration>
    static std::chrono::time_point<galileo_clock, std::common_type_t<Duration, std::chrono::seconds>>
    from_utc(const utc_time<Duration> &t){
      using CDur = std::common_type_t<Duration, std::chrono::seconds>;
      return std::chrono::time_point<galileo_clock, CDur> { t.time_since_epoch() }
        - std::chrono::seconds { 315964809 }
        - std::chrono::seconds { 619315200 };
    }
};

This galileo_clock can be easily adapted to work for Beidou and Glonass by using the numbers reported in the formulas above.

As an aside, isn't C++ syntax lovely?

With all that templatey angle brackets, I think they really make the code terse. Oh and don't you love how you always have to repeat std:: and std::chono:: everywhere? I know that you have to do it only because using directives hopelessly pollute your scope if you use them in headers, but it really makes it clear where each name comes from, even if you're using them fifteen hundred times in five lines. It really makes the code more understandable~. (╯°Д°)╯ ┻━┻

The asterisks

Alas, the story does not end here. As you might have noticed, the whole blog post was sprinkled with asterisks.

As stated above, unless you need to deal precisely with timestamps before 1972 the asterisks shouldn't bother you. However, if you are curious, keep on reading to find out what they are all about.

Those asterisks stem from the following assumptions that are often considered to be true, but technically aren't:

  1. 1 UTC second is always equal to 1 TAI second
  2. There have been 10 leap seconds between the TAI epoch and 01-01-1970
  3. The POSIX epoch is 01-01-1970 00:00:00 UTC
  4. "POSIX time" has always meant what it means now

Let's look at them in detail, starting from the first one. The equivalence

\[ 1 \, \text{UTC second} = 1 \, \text{TAI second} \]

Is true only from the beginning of 1972 onwards. Before that date, leap seconds in the current form did not exist, and the difference between TAI and UTC could be of a fractional number of seconds. Moreover, the length of a TAI second would be different from the length of a UTC second, and the length of a UTC second would change over time.

This proved to be quite cumbersome to maintain, due to the frequent jumps applied to UTC to keep it in sync with UT2 and to the complexity of converting between TAI and UTC.

In the end, it was decided that from 1972 onwards UTC would behave differently. It would track UT1 (instead of UT2), and 1 UTC second would last exactly as long as 1 TAI second. A final, irregular jump was applied to UTC at 01-01-1972 00:00:00 UTC. This last irregular irregular jump made it so that the time point 01-01-1972 00:00:00 UTC matched exactly with 01-01-1972 00:00:10 TAI (note the extra 10 seconds), and from there onwards the two systems would "tick" at the same speed and be separated by at most an integer number of seconds, leading to the modern concept of leap seconds. [5]

The 10 second difference between TAI and UTC comes from the slowdown accumulated by UT1 (and therefore UTC) between 1958 (the TAI epoch) and 1972. In other words, it could be said that between 1958 and 1972 there were "in total" 10 leap seconds, which shows why the second assumption stated above is wrong: the 10 leap seconds were between 1958-1972, and not 1958-1970.

We can confirm this by looking at the historical TAI-UTC difference published by the USNO (U.S. Naval Observatory) [6], of which I reported a snippet below:

1965 JUL  1 =JD 2438942.5  TAI-UTC=   3.7401300 S + (MJD - 38761.) X 0.001296 S
1965 SEP  1 =JD 2439004.5  TAI-UTC=   3.8401300 S + (MJD - 38761.) X 0.001296 S
1966 JAN  1 =JD 2439126.5  TAI-UTC=   4.3131700 S + (MJD - 39126.) X 0.002592 S
1968 FEB  1 =JD 2439887.5  TAI-UTC=   4.2131700 S + (MJD - 39126.) X 0.002592 S
1972 JAN  1 =JD 2441317.5  TAI-UTC=  10.0       S + (MJD - 41317.) X 0.0      S
1972 JUL  1 =JD 2441499.5  TAI-UTC=  11.0       S + (MJD - 41317.) X 0.0      S
1973 JAN  1 =JD 2441683.5  TAI-UTC=  12.0       S + (MJD - 41317.) X 0.0      S
1974 JAN  1 =JD 2442048.5  TAI-UTC=  13.0       S + (MJD - 41317.) X 0.0      S

As you can see, the situation before 1972 was quite messy. By using this data, it can be computed that at 01-01-1970 00:00:00 UTC the difference between TAI and UTC as of about 8 seconds, not 10 seconds as reported in the time line image above (and hence the asterisks). [7]

But then there is the question of what happened to those two seconds. In full "worse is better" spirit, the computing world basically decided that UTC before 1972 didn't happen. What was done instead was to take UTC in its post-1972 version and just "roll with it" backwards 1 TAI second (and therefore 1 post-1972 UTC second) at the time.

For example, what Linux currently calls 01-01-1970 00:00:00 UTC would actually be roughly 01-01-1970 00:00:02 UTC if we use the real UTC as it was defined between 1970 and 1972, as shown by the following diagram:

An image showing the relationship between Linux UTC, real UTC and TAI

Note that, after 1972, "Linux" UTC and "real" UTC behave in the same way. Given this, a more precise definition of POSIX time would be "the number of TAI seconds elapsed since 1972-01-01 00:00:00 (real) UTC, plus the amount of TAI seconds in two TAI years, minus the amount of leap seconds since 1972", which is quite a mouthful. This difference between "real" UTC and "Linux" UTC is why the third assumption mentioned at the beginning of this section is false.

Sources for this are surprisingly sparse. Besides Taylan Kammer's notes on the subject (who I should also thank for the kind emails that helped me solving this conundrum) and a few random tibits here and there I couldn't find much.

Therefore, I decided to look for the authoritative source of thruth: the source code for the time() system call on a recent version of Linux (commit 2fcd07b7ccd5fd10b2120d298363e4e6c53ccf9c at the time of writing).

Now, I am not familiar at all with the Linux kernel source code, so take whatever follows with a huge grain of salt. However, it seems that the time syscall is just a thin wrapper around ktime_get_real_seconds.

This in turn seems to "just" return the xtime_sec field of an instance of the timekeeper struct which seems to be initialized in the timekeeping_init method.

This method in turn defers to read_persistent_wall_and_boot_offset which in turn calls read_persistent_clock64 from rtc.c file of the specific architecture (x86 in our case).

This uses the get_wallclock method of the x86_platform_ops struct, which is bound to mach_get_cmos_time (source). which in turn calls mc146818_get_time. This function (with a few more in-between steps) reads directly the current time set in the CMOS chip.

The tm struct read from the CMOS represents the time as year-month-day hour-minute-seconds, as we can see both from the struct's fields names and the MC146818 datasheet.

The kernel needs to convert this to POSIX time, and we can see that it does so at the end of mach_get_cmos_time by calling rtc_tm_to_time64. This in turns calls mktime64 and there our journey ends.

mktime64 does a very straightforward conversion, which assumes that all minutes have the same length (so no leap seconds) and, more importantly for our question, that every second ever has always had the same length. We finally have a first-hand source confirming what's going on with POSIX time before 1972, at least on the Linux kernel version identified by that commit. If we wanted to be really thorough we would also have to check all the other architectures, but I'll call this good enough.

This leaves us with the fourth assumption, that "POSIX time" has always meant what it means now. Also this is not true.

Although probably here we may have to make a distinction between POSIX time and UNIX time. I don't know that and didn't want to go down *that* rabbit hole.

If we go back to 1971 and look at the First Edition's Unix Programmer's Manual we will see that originally, "UNIX time" was measured in sixtieths of a second since midnight, 1st January 1971:

An image of the origina UNIX programmer's manual showing the original definition of the time syscall.

The definition of "UNIX time" changed in 1972, when it was modified to still return the sixtieths of a second, but since midnight, 1st January 1972. It was changed again in 1973 to the current definition (with the exception of referencing GMT instead of UTC). [8].

This means that the answer to the question "what's the POSIX time of this time point" has different answers depending on when you ask it, and puts the final nail in the coffin in the idea of using POSIX time for representing dates before 1973 if you need to be precise.

Conclusions

All of this started with an easy question: how to convert a POSIX timestamp to a GPS time timestamp. As in the best of traditions, finding the correct answer to this required a surprising amount of historical context, which I am happy to have summarized here (despite the absurd time commitment that this required).

Feel free to use the material reported in this post for whatever project you're working on, but please link back to this blog post if you share it publicly.

References

[1] I didn't want to invest more time into wasn't able to find a precise, authoritative definition of TAI/UT1/UTC. I relied on various other Internet things I found (a, b, c, d).

[2] Summary of the various clocks in Navipedia.

[3] POSIX standard and Stackoverflow answer regarding POSIX time and leap seconds

[4] Stackoverflow answer concerning this. Not all answers to the question are equally correct.

[5] Source and Wikipedia

[6] The conversion table published by USNO.

[7] The 8 seconds

[8] A very helpful Stackoverflow summary and another blog post with more rabbit holes to dig into if you're interested.