20150321 – Keeping it simple

I wrote last time about the problem of how to interpret DNS responses, in particular faced with the NTP pool etc.

Since then things have only gotten even more hairy, for instance a number of correspondents have pointed out other NTP-server pools and my idea for a heuristic for dealing with such pools have not held up under real-world testing.

There is a certain perverse satisfaction in ripping out code you have spent a lot of time writing, similar to destroying the sand castle before going home from the beach, but despite Fred P. Brooks strong admonition, I still find it very hard to toss my prototypes.

Starting from scratch, I have come up with a much simpler and more robust algorithm. In pseudo-code, the outline looks like this:

For each server-argument:
    Do DNS lookup
    Create one peer per IP number.
    while (1):
       Poll next peer
       if reply is "Kiss-Of-Death":
           Blacklist peer one day
       If we lost quorum, or X seconds has passed:
           Remove peers which have not been in DNS response for a week
           Disable peers which have not been in quorum for X seconds.
           Perform DNS lookup
           While there are less active peers than IP# in DNS response
             AND there are new IP#s in DNS response:
               Create new peer.
           While #peers > #IPs in DNS response:
               Remove peers not in DNS response in reverse usability order.
               (blacklisted -> disabled -> worst time source)
           Reenable those disabled peers which survived

The main compromise is that we do not immediately follow a DNS change for a non-pool hostname.

The biggest potential for surprise is probably that it takes a full week until clients discover that you changed an IP# in the DNS record, but they find additions or removals in less than X seconds.

In the pool scenario there is no doubt that the one week timeout will cause servers with low traffic-share to get needlessly replaced because the DNS refresh misses the short time windows where the pool announces that server.

For the client in question, this is potentially an undesirable clock disturbance, but from the pool and in particular from the servers point of view it is probably a benefit to have ‘persistent’ clients migrate towards servers with larger traffic-share, leaving more headroom for traffic-engineering the low-traffic servers.

And credit should be given where credit is due: Ask Bjørn Hansen, competent wrangler of pool.ntp.org, proposed the one week timeout over a month ago, but I kept thinking I could do something better and smarter than that.

“X seconds”

For the above to work as desired in the pool scenario, we must do sufficiently many DNS lookups in a week, that there is a credible chance that servers reappear in the results.

The limiting factor here are the large pools, like us.pool.ntp.org where a couple of thousand servers make an appearance different fractions of the time.

My gut-feeling is that checking every couple of hours (and on demand if the clock-combiner fails to build a quorum) sounds about right: 84 polls in a week, each typically responding with four IP numbers should give servers with just 1% traffic-share good chances of surviving.

“Poll next peer”

In traditional NTP, each peer has a poll-interval which is respected almost religiously, which together with the fact that poll intervals are always a power of two enters into the mathematics in a rather fundamental way.

In Ntimed I’m taking a different angle on this aspect.

Each peer still has a desired poll-interval, but instead of scheduling polls strictly at these times, we schedule polls a the long term average rate necessary to service them.

For example, if we have 5 peers to be polled every 64 seconds and 3 peers to poll every 2048 seconds, we will poll one server every:

CONST / (5 * CONST / 64 + 3 * CONST / 2048) = 12.56 seconds

(The easiest way to understand this is to think of CONST as a full day, but mathematically the value chosen doesn’t matter.)

This advantage to this scheme is that the offset estimates arrive at the PLL at a constant rate and we can use timing information also from the packet exchanges with servers we poll seldomly.

But as a result we no longer poll the servers at the exact requested rate, running the above example through a trivial simulation stabilizes at this pattern after some time:

Time     Server    dT
18438.1     A    62.8
18450.6     B    62.8
18463.2     C    62.8
18475.8     D    62.8
18488.3     E    62.8
18500.9     a  2047.3
18513.4     A    75.4
18526.0     B    75.4
18538.6     C    75.4
18551.1     D    75.4
18563.7     E    75.4
18576.2     b  2047.3
18588.8     A    75.4
18601.4     B    75.4
18613.9     C    75.4
18626.5     D    75.4
18639.0     E    75.4
18651.6     c  2047.3
18664.2     A    75.4
18676.7     B    75.4
18689.3     C    75.4
18701.8     D    75.4
18714.4     E    75.4
18727.0     A    62.8
18739.5     B    62.8
18752.1     C    62.8
18764.6     D    62.8
18777.2     E    62.8

A-E are the 64 second servers, and most of the time they are polles slightly faster, but every time one of the slow poll servers (a-c) pops in, their next poll gets delayed a bit.

But being freed from the power-of-two restriction, there is nothing to prevent us from asking for poll-rates of 64, 81, 95, 100, 107, 1740, 1920 and 3000 seconds which may come handy some day.

Here is my python code if you want to play with this:

from __future__ import print_function

l = [
        [ 0, "A", 64, 0 ],
        [ 0, "B", 64, 0 ],
        [ 0, "C", 64, 0 ],
        [ 0, "D", 64, 0 ],
        [ 0, "E", 64, 0 ],
        [ 0, "a", 2048, 0 ],
        [ 0, "b", 2048, 0 ],
        [ 0, "c", 2048, 0 ],

r = 0.0
for i in l:
        r += 1.0 / i[2]
r = 1.0 / r

print("Rate", r)

t = 0
for i in range(10000):
        t += r
        j = l.pop(0)
        print("%7.1f" % t, "%5s" % j[1], "%7.1f" % (t - j[3]), \
            "%5.0f" % j[2], "%7.1f" % (t - (j[2] + j[3])))
        j[0] = j[2] + t
        j[3] = t;