YQ-DemoGit / base / e1d6c080f0b - Bitbucket

Commits

John Grossman authored e1d6c080f0b10 四月 2012
Make common_time more deferential when coming out of networkless mode.

Addresses issues seen in bug 6260139.

This is a really tough bug to repro, but there is no doubt that it is
happening occasionally on our super huge A@H subnet.  I have collected
data all weekend; the failure did not occur, but I got enough to have
a theoretical sequence of events which could trigger this behavior.
The sequence goes like this.

1) A network is running and happy with a timeline master M,
   maintaining timeline X.
2) Device B boots, but its network is taking a long time to come up.
   After 60 seconds of waiting for the network to come up, device B
   goes into networkless master mode and creates timeline Y.
3) Device B's network comes up.  It immediately sends a master
   announcement saying that it is the current low-priority master of
   timeline Y (its low priority because it has never had any real
   clients)
4) Master M ignores B because B is low priority.
5) Device C boots and sends out a who is master request.  It is a race
   between M and A to see who will respond first.  In this case, A
   responds first.
6) C sends B a request which B receives.  B now has its first client
   and is now high priority.  In this scenario, B matches M in all
   aspects of the priority ranking function, including winning the tie
   breaker (larger MAC address when interpreted as a 48 bit integer)
7) M sends its master announcement; it is ignored by B since B
   now wins in the ranking function vs M.
8) Finally, B sends its next master announcement.  M sees it, realizes
   that there is a higher priority master out there (looks like a
   bridged network scenario to M).  M gives up master status along
   with timeline X.  The clients of M become clients of B and move
   from timeline X to timeline Y (something which should only be
   needed during an actual network bridging event)

This change has a few different things meant to severely minimize the
chance that this can happen.

First, and the most important change, is that networkless masters do
not immediately announce themselves as masters on the network they are
joining.  Instead, they transition into Ronin to discover any
pre-existing masters on the network.  If there are no masters out
there, the device will simply transition back to master and continue
to maintain the timeline it had in networkless mode.  In the scenario
above, however, B should discover M and become its client, preserving
the established timeline X.

Second, any time a device experienced an interface reconfiguration
(including coming out of networkless mode), it clears its high
priority bit.  This is a good thing.  The bit used to get set again
any time...

1) The device is master and receives a client request.
2) The device becomes a client of another master on the network.
3) The device becomes a master.

Number 3 in this list is a mistake.  The high priority bit should only
be set for devices during master election which have been
participating in a timeline which has been used by multiple devices.
We know that this is the case when we are master and receive a
request.  We also know that this is the case when we hear from a
master and decide to become its client.  Simply becoming a master
should not make us high priority.  This behavior has been removed.

Third, timeouts have been adjusted just for some extra "stickyness"
when it comes to master status.  Clients now say in the Ronin state
for up to 10 seconds looking for a master sending up to 20 discovery
requests, instead of only 3 seconds (sending 6 requests).  The
wait-for-election timeout has been adjusted up from 5 seconds to 12.5
seconds to track the longer election cycle as well.  Also, while in
steady-state, clients will now wait until 10 packets (10 seconds)
have not been answered by its master before giving up and dropping
into Ronin.

Change-Id: I438b39f31265e34d6719d4adfa9e8b95a2afc188
Signed-off-by: John Grossman <johngro@google.com>