This post is more than 5 years old
56 Posts
0
1908
Cisco MDS: some hosts execute the Link Reset Protocol without any apparent cause or other errors
Hi there,
On several MDS systems (9140 on 3.3(1c), 9506 on 4.2(1a), and others) we experience an interesting phenomenon. It seems to only occur on a specific type of server. Hopefully, somebody here recognizes the symptoms and can share experiences.
The problem is that on some links, on a link in the Active state, the host will execute the Link Reset Protocol every now and then. It will send a Link Reset primitive sequence, the switch responds with LRR, the host responds with Idle, and finally the switch sends an Idle and the ports are Active again. When doing a "show interface", such an event is visible because the output LRR counter is incremented by 1 without any other counter increasing. The SNMP counter "fcFcPortLinkResetIns" counter will also increase by 1 each time such an event occurs. This is all that happens: all login sessions over the link remain intact, no FLOGI, port login or process login happens afterward.
The puzzling thing is that no other error counters increase. No CRC errors, no InvalidTxWords, no SyncLosses, no LinkFailures, nothing. The physical links seem just fine. Hosts don't really seem to be bothered by this problem, but we do have some Exchange-servers that sometimes complain about individual I/O's taking extremely long (60 seconds), without any further effects. However: in almost every case an Exchange log entry coincides with a Link Reset. So, this is something worth investigating.
We've performed the usual stuff (replacing SFPs, cables), but as I mentioned above, the physical connections seem fine. This occurs on different switches in different datacenters.
Does anybody recognize this?
Regards,
Jurjen Oskam
newwdomi1
8 Posts
0
December 16th, 2010 08:00
LR is sent when a link timeout occurs. And link timeout is detected when R_RDY is not received within E_D_TOV after BB-credit has reached 0.
Then you are running out of credit too fast.
I would check N_Port BB credit. Default is 16 on MDS N_Port.
What is the distance. You need 1 credit on 2 Km at 1 Gbs)
Maybe check the HBA firmware and driver version and EMC recommended settings.
Dominique
Ali_Kaunain
11 Posts
1
August 10th, 2010 19:00
Hi Jurjen,
I have'nt seen this exact issue. I'd request you to raise this issue as a service request with our connectivity support team, as it requires a look at the logs to check if these link errors show up in the port counters. If the logs come out clean and the issue cannot be addressed, we can then put it across to the engineering team and they could check and confirm if its a known issue or troubleshoot it from their side.
Thanks,
Ali
Jurjen_Oskam
56 Posts
0
August 11th, 2010 04:00
Hi,
Thanks, we have indeed already involved the vendor of the servers, since the servers are initiating the link reset protocol and the majority of hosts do not show this behaviour. They'll probably bring in a Fibre Channel analyzer to see what's happening, but the reason I posted here was to find out whether someone perhaps recognizes this and hopefully has a quick solution or explanation...
Anyway, I'll post the result of that case here once I have more info, for the sake of the archives.
Regards,
Jurjen
Ali_Kaunain
11 Posts
0
August 11th, 2010 19:00
Thanks in advance for your efforts Jurjen. That would really help.
Regards,
Ali