Start a Conversation

Unsolved

This post is more than 5 years old

A

2183

December 14th, 2018 03:00

One VLAN misbehaving on Force10 S55 / MXL

A while ago I inherited the network in our DC and with it, apparently an issue with failover between two ISP routers. Both our own routers and the ISP routers (connecting to an IPVPN) are using VLAN 61 to route traffic between them. The routers are connected to two TOR switches, which are not stacked, but have an uplink to a stacked fabric switch (in a blade chassis).

However, traffic seems to not (always) flow over this VLAN between the TOR switches:

  • RTR01 and RTR02 can ping each other
  • RTR01 can ping ISP01 (both the VRRP address as the node address)
  • RTR02 cannot ping ISP01. However, it can get the mac-address via ARP and I see the VRRP multicast messages
  • ISP02 is currently disconnected because we otherwise get a split-brain scenario and all sorts of other fun
  • If I add my laptop into either switch I see the same thing happening. From TOR1 I cannot ping anything on TOR2 and vice versa. I CAN ping everything on the same switch
  • If I have my laptop on TOR1 and ping RTR02 while running tcpdump on RTR02, I see the echo request from my laptop and the reply the router sends. I also see these in a trace on the port-channel between TOR2 and the fabric, but not on a trace on the port-channel between TOR1 and the fabric.

I especially don’t understand how it can work between RTR01 en RTR02 but no other devices in the same VLAN.

It is working for other, identical configured, VLANs. So, the connections look like this

Vlan OverviewVlan Overview

 

TOR1 & TOR2

interface Vlan 61
 no ip address
 tagged GigabitEthernet 0/16,24-25
 tagged Port-channel 2
 no shutdown

interface Port-channel 2
 no ip address
 switchport
 switchport mode private-vlan trunk
 channel-member TenGigabitEthernet 0/50-51
 no shutdown

interface TenGigabitEthernet 0/50 (and 0/51)
 no ip address
 no shutdown

interface GigabitEthernet 0/16 (and 0/24, 0/25)
 no ip address
 switchport
 spanning-tree rstp edge-port bpduguard shutdown-on-violation
 no shutdown

Fabric:
interface Vlan 61
 no ip address
 tagged Port-channel 1-2
 no shutdown

interface Port-channel 1
 no ip address
 mtu 12000
 switchport
 switchport mode private-vlan trunk
 channel-member TenGigabitEthernet 0/41
 channel-member TenGigabitEthernet 1/41
 no shutdown

interface Port-channel 2
 no ip address
 mtu 12000
 switchport
 switchport mode private-vlan trunk
 channel-member TenGigabitEthernet 0/42
 channel-member TenGigabitEthernet 1/42
 no shutdown

interface TenGigabitEthernet 0/41 (and 0/42, 1/41 and 1/42)
 no ip address
 mtu 12000
 no shutdown 

 All switches show the correct mac addresses in VLAN 61.

I have the feeling I am missing something obvious but I cannot find it :(

Moderator

 • 

8.9K Posts

December 14th, 2018 06:00

Hi,

There are not any connections directly between the TOR switches? Is spanning tree enabled? It seems like it is behaving like there is a loop. Is the firmware up to date? If the laptop is connected to a TOR switch can it ping devices on the MXL?

5 Posts

December 14th, 2018 13:00

Oh, with regards to firmware:

The Force10 S55 units are running 8.3.5.5, latest one (also dated) is 8.3.5.6

The Force10 MXL10/40 units are running 9.2. I believe 9.14.1.1 is the latest version.

 

So no, seems we are not running the latest versions

5 Posts

December 14th, 2018 13:00

I have been wondering about spanning-tree and whether it could be an issue. Sadly, networking is not my primary vocation and while from layer 3 and up I think I know the basics at least, at layer 2 it gets a bit murky.

That being said: Yes, on all the three switches (two singles, one stack) rstp is enabled as spanning-tree protocol. I only really know some basics about spanning tree so I am not sure what I should check, but this is at least what I know:

  • Root bridge ID on all the switches is the same, which seems to me to indicate that at least they are all aware of each other
  • All of the interfaces in the VLAN that have link, have a status of forwarding in spanning-tree
  • All the switchports in the VLAN are set as edgeport (which may not be correct, since its all routers) and my understanding of the config "spanning-tree rstp edge-port bpduguard shutdown-on-violation" should mean that the interface either works, or is disabled. In the latter case this should be seen in the spanning-tree overview and should be logged (right?)

None of the trunk ports have spanning tree enabled. I have no idea if that is correct (it seems to work for all the other VLANs). Both TOR switches have a port-channel containing two tengiga ports connected to the fabric switch.

Normally only the primary router has IP addresses in this range (generally RTR01). If I add an VLAN & IP to RTR02 on the box level for vms running on the blades in the fabric, I can ping those and those can ping RTR02.

With regards to the specific VLAN 61, RTR02 can ping a VM if it has been setup for VLAN 61, however, the otherway around, that VM cannot successfully ping RTR02. As said before, if running tcpdump I can see both the echo request arrive and the reply go out but the reply does not arrive at the VM. RTR01 works in both directions from the VM and it can also ping the ISP router.

 

5 Posts

December 17th, 2018 05:00

After reading: Trunking-an-S4810-and-an-S2410-together

I am wondering if my problem is also MTU related. Wouldn't really explain why we havent noticed this to be a problem before, but:

Currently the TOR switches have no MTU set (including the trunk ports) and thus default to 1554. Also, being Force10 S55 units, I believe they are limited to 9252 bytes.

The MXL however has an MTU of 12000 set for the interfaces in the trunk and the port channel itself.

Not sure how MTU works on layer 2 but the linked forum post seems to descibe similar behaviour to what I am finding.

Is there a good reason to set the port channel MTU higher then 1554? The MXL has iSCSI traffic but that doesnt leave to the TOR switches. The only traffic that goes there is regular traffic, which I expect to be all 1554 anyway.

Moderator

 • 

8.9K Posts

December 17th, 2018 07:00

Yes, it could be that, I would just set the max MTU to the same on both, even if jumbo frames are not normally being used it doesn’t hurt anything to have a higher maximum set.

5 Posts

March 6th, 2019 02:00

Just updating this ticket in case anyone else runs into a similar problem. I did update the MTUs but this also did not make a difference in the behaviour.

I am still not sure what the problem really was, but in the end we decided to reboot the switches and after the reboot of the MXL switches, without any configuration changes, the configuration worked as intended.

One other thing I noticed after this reboot is that we now get these messages, which I have not seen before (my experience with Force10 is limited):

Mar 6 08:29:23.128 UTC: %MXL-10/40GbE:0 %LCMGR-5-CMC_CHASSISUPD: Received Chassis update from CMC in Stack-unit 0, Doorbell: 40

I am presuming the fact that this message was not seen before the reboot might indicate that the nodes didn't fully communicate with each other? Configuration updates did seem to apply to both nodes successfully.

I guess one of my previous managers had it Ein reboot macht immer gut"

(edit: formatting)

No Events found!

Top