Distributed Spanning Tree Algorithm
Finding a spanning tree for a graph is an easy problem. One can simply pick a node and elect links to new nodes continuously until all nodes are included. Where the problem gets more difficult is when you don't have any knowledge of the graph topology. If you imagine a local area network that is constantly being modified you can see that it would be real nice to avoid having to manually reconfigure the network every time something changes.
To demonstrate the distributed spanning tree algorithm we use a simple network constructed of the following:
Other terminology:
- Root - The root is the ID number of the bridge which will be considered the root of the spanning tree. All bridges must have a unique ID number or the algorithm will have no way to agree on a root and thus will fail.
- Ports - Ports are simply the connections on a bridge. For the sake of the algorithm we can allow unlimited ports. If our bridges were full-blown routers then they would maintain routing tables mapping certain addresses/subnets to. But that's a whole 'nother distributed algorithm. In this case we just want to know which ports are part of the spanning tree and which are not.
- Distance - Distance is the number of hops to the root bridge. In reality each port can be assigned a different distance according to bandwidth or some other metric, but for the sake of explanation hops is good enough.
Our goal is a distributed algorithm that can dynamically handle any physical network changes. The code should be simple and identical in each bridge. Some bandwidth will be wasted due to unused connections, but such is life. This algorithm is very closely related to Dijkstra's Algorithm.
The Algorithm
Each bridge must keep track of the following information:
- selfid - A globally unique hardware number such as MAC address.
- bestroot - The ID# of the bridge currently treated as root.
- bestdist - The distance to bestroot.
- bestport - The port to use to get to that root.
- isactive - Initialized to true, set to false if the bridge does not have any picked ports (see below).
Plus for each port:
- picked - Set to true if this port is a leaf in the spanning tree.
- lasttime - The last time a config message was read on this port.
- root - The root according to this port.
- dist - The distance to this port's root.
- neighbor - The hub connected on this port.
Initially, in the absence of remote information, a bridge will assume bestroot to be itself, and bestdist to be 0. The root sends out special configuration messages on all ports at regular time intervals. Initially all bridges will be sending these, but soon all bridges will recognize the same root. The message includes only three data: bestroot, bestdist, and selfid.
Processing Config Messages
Upon receipt of a config message, the bridge checks to see if the new config is 'better' and updates the info for that port accordingly. Better means that either we haven't heard anything from that port, or the root is lower, or the root is the same but the distance is lower, or the root and distance are the same but the neighbor is lower.
If the port info was updated then those same three data need to be tested against the overall best for the bridge using the same criteria, with bestroot, bestdist, and bestport updated accordingly. The bestdist is, of course, equal to the port distance + 1 since it's effectively one more hop.
After the best info is updated, we must designate which ports are the leaves on the spanning tree. This is done by a similar comparison to see if bestroot, bestport, and bestport's neighbor are superior to the last received info on every port. If so then those ports are picked, otherwise they are not picked and will receive no transmissions (other than config messages) from this bridge.
If no ports are picked then the bridge can be designated inactive indicating that it doesn't need to process any incoming packets at all.
Reconfiguring
The algorithm works pretty well as is, but if a bridge is disconnected then we can start losing packets into the ether as messages are transmitted to a non-existent bridge. For this purpose we set up a time interval on which we expect to hear from each unpicked port (picked ports may be receiving config messages only from us, and thus we should not expect hear back from them). If no config message comes before the end of that interval we trigger a reconfiguration where the best info is recalculated from the remaining ports that are still communicating. This will fix the entire network in at most twice the length of the reconfiguration interval.
Andrew S. Tanenbaum. Computer Networks, 4th Ed. Upper Saddle River, NJ: Prentice Hall, 2003.