Unsolved
This post is more than 5 years old
5 Posts
0
2183
December 14th, 2018 03:00
One VLAN misbehaving on Force10 S55 / MXL
A while ago I inherited the network in our DC and with it, apparently an issue with failover between two ISP routers. Both our own routers and the ISP routers (connecting to an IPVPN) are using VLAN 61 to route traffic between them. The routers are connected to two TOR switches, which are not stacked, but have an uplink to a stacked fabric switch (in a blade chassis).
However, traffic seems to not (always) flow over this VLAN between the TOR switches:
- RTR01 and RTR02 can ping each other
- RTR01 can ping ISP01 (both the VRRP address as the node address)
- RTR02 cannot ping ISP01. However, it can get the mac-address via ARP and I see the VRRP multicast messages
- ISP02 is currently disconnected because we otherwise get a split-brain scenario and all sorts of other fun
- If I add my laptop into either switch I see the same thing happening. From TOR1 I cannot ping anything on TOR2 and vice versa. I CAN ping everything on the same switch
- If I have my laptop on TOR1 and ping RTR02 while running tcpdump on RTR02, I see the echo request from my laptop and the reply the router sends. I also see these in a trace on the port-channel between TOR2 and the fabric, but not on a trace on the port-channel between TOR1 and the fabric.
I especially don’t understand how it can work between RTR01 en RTR02 but no other devices in the same VLAN.
It is working for other, identical configured, VLANs. So, the connections look like this
Vlan Overview
TOR1 & TOR2 interface Vlan 61 no ip address tagged GigabitEthernet 0/16,24-25 tagged Port-channel 2 no shutdown interface Port-channel 2 no ip address switchport switchport mode private-vlan trunk channel-member TenGigabitEthernet 0/50-51 no shutdown interface TenGigabitEthernet 0/50 (and 0/51) no ip address no shutdown interface GigabitEthernet 0/16 (and 0/24, 0/25) no ip address switchport spanning-tree rstp edge-port bpduguard shutdown-on-violation no shutdown Fabric: interface Vlan 61 no ip address tagged Port-channel 1-2 no shutdown interface Port-channel 1 no ip address mtu 12000 switchport switchport mode private-vlan trunk channel-member TenGigabitEthernet 0/41 channel-member TenGigabitEthernet 1/41 no shutdown interface Port-channel 2 no ip address mtu 12000 switchport switchport mode private-vlan trunk channel-member TenGigabitEthernet 0/42 channel-member TenGigabitEthernet 1/42 no shutdown interface TenGigabitEthernet 0/41 (and 0/42, 1/41 and 1/42) no ip address mtu 12000 no shutdown
All switches show the correct mac addresses in VLAN 61.
I have the feeling I am missing something obvious but I cannot find it :(
DELL-Josh Cr
Moderator
Moderator
•
8.9K Posts
0
December 14th, 2018 06:00
Hi,
There are not any connections directly between the TOR switches? Is spanning tree enabled? It seems like it is behaving like there is a loop. Is the firmware up to date? If the laptop is connected to a TOR switch can it ping devices on the MXL?
AndrevdG
5 Posts
0
December 14th, 2018 13:00
Oh, with regards to firmware:
The Force10 S55 units are running 8.3.5.5, latest one (also dated) is 8.3.5.6
The Force10 MXL10/40 units are running 9.2. I believe 9.14.1.1 is the latest version.
So no, seems we are not running the latest versions
AndrevdG
5 Posts
0
December 14th, 2018 13:00
I have been wondering about spanning-tree and whether it could be an issue. Sadly, networking is not my primary vocation and while from layer 3 and up I think I know the basics at least, at layer 2 it gets a bit murky.
That being said: Yes, on all the three switches (two singles, one stack) rstp is enabled as spanning-tree protocol. I only really know some basics about spanning tree so I am not sure what I should check, but this is at least what I know:
None of the trunk ports have spanning tree enabled. I have no idea if that is correct (it seems to work for all the other VLANs). Both TOR switches have a port-channel containing two tengiga ports connected to the fabric switch.
Normally only the primary router has IP addresses in this range (generally RTR01). If I add an VLAN & IP to RTR02 on the box level for vms running on the blades in the fabric, I can ping those and those can ping RTR02.
With regards to the specific VLAN 61, RTR02 can ping a VM if it has been setup for VLAN 61, however, the otherway around, that VM cannot successfully ping RTR02. As said before, if running tcpdump I can see both the echo request arrive and the reply go out but the reply does not arrive at the VM. RTR01 works in both directions from the VM and it can also ping the ISP router.
AndrevdG
5 Posts
0
December 17th, 2018 05:00
After reading: Trunking-an-S4810-and-an-S2410-together
I am wondering if my problem is also MTU related. Wouldn't really explain why we havent noticed this to be a problem before, but:
Currently the TOR switches have no MTU set (including the trunk ports) and thus default to 1554. Also, being Force10 S55 units, I believe they are limited to 9252 bytes.
The MXL however has an MTU of 12000 set for the interfaces in the trunk and the port channel itself.
Not sure how MTU works on layer 2 but the linked forum post seems to descibe similar behaviour to what I am finding.
Is there a good reason to set the port channel MTU higher then 1554? The MXL has iSCSI traffic but that doesnt leave to the TOR switches. The only traffic that goes there is regular traffic, which I expect to be all 1554 anyway.
DELL-Josh Cr
Moderator
Moderator
•
8.9K Posts
0
December 17th, 2018 07:00
Yes, it could be that, I would just set the max MTU to the same on both, even if jumbo frames are not normally being used it doesn’t hurt anything to have a higher maximum set.
AndrevdG
5 Posts
0
March 6th, 2019 02:00
Just updating this ticket in case anyone else runs into a similar problem. I did update the MTUs but this also did not make a difference in the behaviour.
I am still not sure what the problem really was, but in the end we decided to reboot the switches and after the reboot of the MXL switches, without any configuration changes, the configuration worked as intended.
One other thing I noticed after this reboot is that we now get these messages, which I have not seen before (my experience with Force10 is limited):
Mar 6 08:29:23.128 UTC: %MXL-10/40GbE:0 %LCMGR-5-CMC_CHASSISUPD: Received Chassis update from CMC in Stack-unit 0, Doorbell: 40
I am presuming the fact that this message was not seen before the reboot might indicate that the nodes didn't fully communicate with each other? Configuration updates did seem to apply to both nodes successfully.
I guess one of my previous managers had it Ein reboot macht immer gut"
(edit: formatting)