Commits

John Grossman authored e1d6c080f0b
Make common_time more deferential when coming out of networkless mode. Addresses issues seen in bug 6260139. This is a really tough bug to repro, but there is no doubt that it is happening occasionally on our super huge A@H subnet. I have collected data all weekend; the failure did not occur, but I got enough to have a theoretical sequence of events which could trigger this behavior. The sequence goes like this. 1) A network is running and happy with a timeline master M, maintaining timeline X. 2) Device B boots, but its network is taking a long time to come up. After 60 seconds of waiting for the network to come up, device B goes into networkless master mode and creates timeline Y. 3) Device B's network comes up. It immediately sends a master announcement saying that it is the current low-priority master of timeline Y (its low priority because it has never had any real clients) 4) Master M ignores B because B is low priority. 5) Device C boots and sends out a who is master request. It is a race between M and A to see who will respond first. In this case, A responds first. 6) C sends B a request which B receives. B now has its first client and is now high priority. In this scenario, B matches M in all aspects of the priority ranking function, including winning the tie breaker (larger MAC address when interpreted as a 48 bit integer) 7) M sends its master announcement; it is ignored by B since B now wins in the ranking function vs M. 8) Finally, B sends its next master announcement. M sees it, realizes that there is a higher priority master out there (looks like a bridged network scenario to M). M gives up master status along with timeline X. The clients of M become clients of B and move from timeline X to timeline Y (something which should only be needed during an actual network bridging event) This change has a few different things meant to severely minimize the chance that this can happen. First, and the most important change, is that networkless masters do not immediately announce themselves as masters on the network they are joining. Instead, they transition into Ronin to discover any pre-existing masters on the network. If there are no masters out there, the device will simply transition back to master and continue to maintain the timeline it had in networkless mode. In the scenario above, however, B should discover M and become its client, preserving the established timeline X. Second, any time a device experienced an interface reconfiguration (including coming out of networkless mode), it clears its high priority bit. This is a good thing. The bit used to get set again any time... 1) The device is master and receives a client request. 2) The device becomes a client of another master on the network. 3) The device becomes a master. Number 3 in this list is a mistake. The high priority bit should only be set for devices during master election which have been participating in a timeline which has been used by multiple devices. We know that this is the case when we are master and receive a request. We also know that this is the case when we hear from a master and decide to become its client. Simply becoming a master should not make us high priority. This behavior has been removed. Third, timeouts have been adjusted just for some extra "stickyness" when it comes to master status. Clients now say in the Ronin state for up to 10 seconds looking for a master sending up to 20 discovery requests, instead of only 3 seconds (sending 6 requests). The wait-for-election timeout has been adjusted up from 5 seconds to 12.5 seconds to track the longer election cycle as well. Also, while in steady-state, clients will now wait until 10 packets (10 seconds) have not been answered by its master before giving up and dropping into Ronin. Change-Id: I438b39f31265e34d6719d4adfa9e8b95a2afc188 Signed-off-by: John Grossman <johngro@google.com>