STP Troubleshooting Best Practices

Spanning Tree Protocol Basics:

In a layer 2 environment with no routing, active redundant paths are neither allowed nor desirable, because that can cause loops. Because a switch only segments collision domains it does not segment broadcast domains. STP helps find redundant links and place one in a blocking state. Without STP, all switches “flood” any frames they receive with an unknown destination media access control (MAC) address. The switches will forward the frame to all interfaces, introducing duplicate frames and leading to a “loop” in which all switches continually forward all frames. This is not only inefficient but also extremely taxing on network resources. Besides violating IEEE protocols, duplicate frames can create “broadcast storms” that pose a threat to network and application stability. With STP and RSTP, only one uplink at a time is active, stopping floods and enabling reconvergence upon failure of a link or interface.

The main difference between Rapid Spanning Tree Protocol (RSTP IEEE 802.1W) and Spanning Tree Protocol (STP IEEE 802.1D) is that Rapid Spanning Tree Protocol (RSTP) assumes the three Spanning Tree Protocol (STP) ports states Listening, Blocking, and Disabled are same (these states do not forward frames and they do not learn MAC addresses). Hence RSTP places them all into a new state called Discarding state. Learning and forwarding ports remain more or less the same.

STP Definitions:

Root bridge — Center of the spanning tree.
Nonroot bridge — Every switch not elected the root.
Root port — Every nonroot bridge has a single root port, decided based on root path cost.
Designated port — Each segment has a single designated port. All ports on a root bridge are designated.
Nondesignated port — Every switch port that is neither a root port nor designated port starts blocking.

Basic Spanning Tree Operation:

Elect root bridge — Lowest bridge ID wins, consists of 2 bytes from 0-65,535 defaults to 32,678 + VLAN and the MAC address of 6 bytes, for example 32769 000a.b7d1.9580 for VLAN 1.
Select root port — One per switch, points toward the root bridge.
Select designated port — One per segment with the lowest root patch cost.
Block ports — Block non-root and non-designated ports.

Spanning Tree Port States:

STP State	The Port Can…	The Port Cannot…	Duration
Disabled	Nothing	Send/Receive Data
Blocking	Receive BPDU's	Send/Receive Data Learn MAC Addresses	Indefinite if loop detected
Listening	Send/Receive BPDUs	Send/Receive data	Forward Delay Timer (15 Seconds)
Learning	S/R BPDU's	S/R data	Forward Delay Timer (15 Seconds)
Forwarding	S/R Data S/R BPDU's Learn MAC Addresses

Securing STP

Root Guard — Is enabled on a per-port basis. When a port receives a superior BPDU, with a lower bridge ID, the local switch will not allow the new switch to become the root. Instead the port is changed to root-inconsistent state, no data can be sent or received until the BPDUs stop. Spanning Tree Root Guard is used to prevent the root of a Spanning Tree instance from changing unexpectedly. The priority of a Bridge ID can be set to zero but another Bridge ID with a lower mac address could also set its priority to zero and take over root.

BPDU Guard — PortFast moves an end-user port to forwarding state without going through all of the STP checks and can induce loops in the network. If any BPDU is received on a port where BPDU guard is enabled that port is put into a disabled state. It can then only be recovered manually. Spanning Tree BPDU Guard is used to disable the port in case a new device tries to enter the already existing topology of STP. Thus devices, which were originally not a part of STP, are not allowed to influence the STP topology.

Spanning Tree Enhancements

• Loop Guard — This feature prevents a port from erroneously transitioning from blocking state to forwarding when the port stops receiving BPDUs. The port is marked as being in loop-inconsistent state. In this state, the port does not forward packets. The possible values are Enable or Disable.

• TCN Guard — Enabling the TCN Guard feature restricts the port from propagating any topology change information received through that port. This means that even if a port receives a BPDU with the topology change flag set to true, the port will not flush its MAC address table and send out a BPDU with a topology change flag set to true.

• Auto Edge — Enabling the Auto Edge feature allows the port to become an edge port if it does not see BPDUs for some duration.

• BPDU Filter — When enabled, this feature filters the BPDU traffic on this port when STP is enabled on this port.

• BPDU Flood — When enabled, the BPDU Flood feature floods the BPDU traffic arriving on this port when STP is disabled on this port.

Strategy for troubleshooting STP:

Find the root bridge, then learn the designated ports on each subsequent switch. Cisco switches run PVST by default so you will have to work through each vlan.

Use the Diagram of the Network

Before you troubleshoot a bridging loop, It is a good idea to know some about your network.

The topology of the bridge network
The location of the root bridge

The root bridge in a spanning-tree network is the bridge with the smallest or the lowest bridge ID.

This can be verified by issuing the following command.

#show span detail active

This command will show us the Root ID , bridge ID, Number of Topology changes, and when the last one was, port cost, role, And port status.

If values of the Root ID and the Bridge ID have equal values, then we confirm that this is the Root Bridge or the Root Switch.

The example above shows that the values of the Root ID and the Bridge ID are different, indicating that this switch is not a Root switch.

The location of the blocked ports and the redundant links

This knowledge is essential for at least these two reasons:

In order to know what to fix in the network, you need to know how the network looks when it works correctly.
Most of the troubleshooting steps simply use show commands to try to identify error conditions. Knowledge of the network helps you focus on the critical ports on the key devices.

Identifying the Loop

When you run into a spanning-tree problem, you will likely receive a sudden flood of calls saying the network is either down or running very slowly. The most definitive way to prove that a spanning-tree loop is the cause is to capture traffic on a link. However, you will normally be under pressure to provide a fix, and that is why the next sections discuss the quickest ways to identify a potential spanning-tree issue.

Verify the type of spanning tree configured on your device, and that STP is turned on.

The following command will show us the mode being used

#Show spanning-tree

If all the switches are on the same mode, we can move on with troubleshooting. If they are on different modes, we need to correct this, and then test.

If there is a mismatch you can use this command to change the mode,

console(config)#spanning-tree mode {stp | rstp |mstp}

• stp — Spanning Tree Protocol (STP) is enabled.

• rstp — Rapid Spanning Tree Protocol (RSTP) is enabled.

• mstp — Multiple Spanning Tree Protocol (MSTP) is enabled.

Duplex Mismatch

A duplex mismatch is a very common problem and is generally caused by one side being configured to full-duplex and the other side being configured to half-duplex.

We can use the following command to check the Duplex status of a port.

# show interfaces detail ethernet 1/g48

Portfast Configuration Error

Portfast is a feature that one typically wants to enable for a port connected to a host. When the link comes up on this port, the first stages of the STP

are skipped and the port directly transitions to the forwarding mode. This can obviously be dangerous when not used correctly. Loops occurs then

when moving a cable and should be transient only.

We can look into loopguard to help prevent this issue.

The Loop Guard feature is an enhancement of the Multiple Spanning Tree Protocol. Loop guard protects a network from forwarding loops induced by BPDU packet loss. It can be configured to prevent a blocked port from transitioning to the forwarding state when the port stops receiving BPDUs for some reason (such as a uni-directional link failure).

The following example enables spanning-tree loopguard functionality on all ports.

console(config)#spanning-tree loopguard default

Are all the interfaces on the Root switch in the forwarding state?

The following command will show us the status of each port.

#Show spanning-tree

On the switch, do you see the 'Number of topology changes' increasing?

To verify, run the command:

#show span detail active

When the 'Number of topology changes' increases, it typically means that the physical state of a port on the switch is changed. If there is an increase in topology changes, continue with verifying non-edge interface state changes. If there is not an increase then move one to analyzing traffic for unusually heavy traffic.

On this switch, verify if the non-edge interfaces have changed state.

Run the following commands and analyze the output to verify if an interface has changed state.

#show log

Displays the syslog messages stored in the internal buffer. Look for transitions from forwarding to blocking.

If there are no transitions we can move on to identifying the node from which the switch received the topology change.

If we see changes, we can look at troubleshooting some of the following.

Check for any loose cables.
Check for any outage.
Check if any configuration change has been made.

Since the cause of the topology change was not identified on the current switch, we need to move to the next node in the Spanning Tree to trace the source of the topology change.

We can use the spanning tree debug to help identify. Use the show debugging command to display packet tracing configurations.

Example

console#debug spanning-tree bpdu

#show debugging

Analyze the traffic interface activity for unusually heavy traffic.

To do this, run the following command, and compare the output with the baseline output when traffic is normal. We can take a look at Broadcast Transmit and Receive. Not the actual packet numbers but the % of broadcast packets vs total packets. The other thing to keep in mind is if the switch has been running for a long time it may take days or weeks until the % would indicate a broadcast storm.

#show statistics (enter specific port or port channel)

Look for Resource Errors

Here is how to check that the device is not running short

of CPU resource.

CPU Utilization

console#show process cpu

Memory Usage

console#show memory cpu

Breaking the Loop

In most organizations, the network has become a critical component of running an efficient and profitable business operation. Any downtime or poor performance can directly affect the bottom line of the organization, so chances are you need to restore the network as quickly as possible, before determining the cause of the problem. You should also be prepared for any reoccurrences to ensure that the problem does not reoccur again. The following strategies can be taken:

Disable port

An effective way to quickly eliminate loops is to manually disable ports that should be in a Blocking state. Performing this action should remove a loop if it has formed and will not affect the network because these ports are normally blocking.

WARNING

Disable ports with caution, as you might accidentally disconnect your Telnet session if you are performing the configuration remotely or disrupt legitimate traffic by shutting down the wrong ports.

shutdown

Use the shutdown command in Interface Configuration mode to disable an interface. To restart a disabled interface, use the no form of this command.

Examples

The following example disables Ethernet port 1/g5.

console(config)#interface ethernet 1/g5

console(config-if-1/g5)# shutdown

The following example re-enables ethernet port 1/g5.

Turning on Event Logging

After restoring the network, you should monitor the network closely for a few hours to ensure the problem does not resurface. An easy way to monitor the network is to turn on event logging/debug for spanning-tree events.

Example

The following example shows how logging is enabled.

console(config)#logging on

Links used

http://www.digitalnetworks-pr.com/Articles/STP.pdf

http://www.informit.com/library/content.aspx?b=CCNP_Studies_Switching&seqNum=42

http://docwiki.cisco.com/wiki/Cisco_Nexus_7000_Series_NX-OS_Troubleshooting_Guide_--_Troubleshooting_STP

http://www.cisco.com/en/US/tech/tk389/tk621/technologies_tech_note09186a00800951ac.shtml#trblshoot

http://kb.juniper.net/InfoCenter/index?page=content&id=KB22832

http://kb.juniper.net/InfoCenter/index?page=content&id=KB22777

http://www.dell.com/downloads/global/products/pwcnt/en/app_note_1.pdf