20141107 – Probably Noon

When we send NTP packages around to measure time, we run into a bit of a problem.

Imagine a client which sends a packet at time T1 at one o’clock.

The packet takes its sweet time and arrives at the server at T2 and the server dutifully records that it received the packet at 3 o’clock.

Later the server gets around to send a reply, at T3, and writes into the packet that it was sent at 5 o’clock.

And finally that packet arrives back at the client at T4, 7 o’clock.

(RFC 1149 networks are wonderful teaching aids, don’t you think?)

../../_images/20141107_fig1.svg

If we make the assumption that the packet took the same amount of time in both directions, the math adds up and we can conclude that the two computers agree what time it is.

But what if one of the packets happened to hitch a ride on the Sub-Etha wave-band, then it might look like this:

../../_images/20141107_fig2.svg

If we still make the assumption that the packet transit times are identical, the picture will look like this:

../../_images/20141107_fig3.svg

And we conclude, erroneously, that the clients clock is two hours ahead of the servers clock.

Had it been the other packet, we would conclude it was two hours behind the servers clock.

../../_images/20141107_fig4.svg

The trouble is that it is not possible for us to actually measure how long time it took the packets to cross the network, and if that wasn’t bad enough it is not only variable, it is also our main source of uncertainty and noise.

Given the measurements in the example, all we can know for certain is that the clients clock is no more than two hours wrong.

But the Sub-Etha wave-band is improbable, and our experience tells us that most packets take roughly the same time from A to B as from B to A. Therefore we can define a probability function which looks like this:

../../_images/20141107_fig5.svg

The nice thing about such “Probability Density Functions” is that you can do math on them.

Here is a very wonderful report from Sandia National Laboratories which will teach you more about that subject than you ever thought you would need to know: Constructing Probability Boxes and Dempster-Shafer Structures

Here are four pdf’s, each one from a single packet-exchange a server from pool.ntp.org:

../../_images/20141107_fig6.svg

The green line is the joint probablity, showing that my laptops clock is very likely correct.

Here is a plot showing why you should use the NTP servers closest to you in packet transit time:

../../_images/20141107_fig7.svg

Notice first of all that the Y axis is logarithmic now.

The probability function of the LAN servers is about 30 times taller than the dk.pool.ntp.org servers, which again are about 20 times taller than the nz.pool.ntp.org servers.

How can a probability be 1700 you might ask ? The answer is that this is not a probability, but a probability density.

As a rule of thumb a ‘pd’ of 1700 means that the probability of the clock being inside a 1/1700 second is 1. In other words my laptops clock agrees with the LAN servers inside a 600µs window.

That does not mean that it is inside ±300µs, the X value where the function peaks must be taken into account, making it something like 700 ± 300µS relative to the servers clock.

Likewise the peak of 10 from the us.pool.ntp.org servers means that I can be certain that the uncertainty is no more than 0.1s.

In classical NTPD we always pick one server at a time, and steer our clock only with the measurements from that server.

We do look at the measurements from all other configured servers, to see if we should switch to a better server, but we do not try to combine the measurements.

My experiments so far indicate that combining the measurements from all configured servers using this “triangular pdf” idea shows a lot of promise for increased clock stability and noise resistance.

phk