NFS exports created on the destination NAS server via a backup interface.
Random hang conditions and performance issues while trying to access the exports on the destination side of replication.
Network trace analysis shows the NFS requests are reaching the NAS server, but the NAS server is not responding.
var/log/sm_daemon.log
2021-06-01 16:01:43.049778 [7f1f973ea700]TRACE sm_del_all_by_mark delete_objects(509): Deleting <NAddr>: IPv4 link=0x000c scope=0 label='bond3.218' addr='x.x.x.x/16' flags=Secondary+/Temporary+/NoDAD-/Optimistic-/DADFailed-/Homeaddress-/Deprecated-/Tentative-/Permanent+
2021-06-01 16:01:43.051418 [7f1f973ea700]TRACE sm_del_all_by_mark delete_objects(509): Deleting <NAddr>: IPv4 link=0x000e scope=0 label='bond3.661' addr='x.x.x.x/22' flags=Secondary+/Temporary+/NoDAD-/Optimistic-/DADFailed-/Homeaddress-/Deprecated-/Tentative-/Permanent+
<--- IP in question
2021-06-01 16:01:43.052528 [7f1f973ea700]TRACE sm_del_all_by_mark delete_objects(509): Deleting <NAddr>: IPv4 link=0x000f scope=0 label='bond3.671' addr='x.x.x.x/22' flags=Secondary+/Temporary+/NoDAD-/Optimistic-/DADFailed-/Homeaddress-/Deprecated-/Tentative-/Permanent+
2021-06-01 16:01:43.053629 [7f1f973ea700]TRACE sm_del_all_by_mark delete_objects(509): Deleting <NAddr>: IPv4 link=0x29ba scope=0 label='bond3.651' addr='x.x.x.x/22' flags=Secondary-/Temporary-/NoDAD-/Optimistic-/DADFailed-/Homeaddress-/Deprecated-/Tentative-/Permanent+
<--- IP that prompts bond removal
2021-06-01 16:01:43.055136 [7f1f973ea700]TRACE sm_del_all_by_mark delete_objects(509): Deleting <NLink>: type=4 link=0x29ba media=0x01 flags=Connected+/Up+/Promisc-/Master-/Slave- flags=0x00011043 flagsChange=0x00000000 MTU=1500 MAC='00:60:16:5c:56:04' name='bondx.xxx' parentLink=0x000a VLAN=xxx kind='vlan' PCIAddr='' NICName='' SpeedDuplex/AutoNeg=0/0,supported{}/0,partner{}/0,advertised{}/0 FlowControl/AutoNeg=0/0 NetNS=0
<--- bond removal
/var/log/messages
2021-06-01T16:01:53+00:00 self kernel: [15786728.512641] unregister_netdevice: waiting for bond3.651 to become free. Usage count = 2
2021-06-01T16:02:03+00:00 self kernel: [15786738.588540] unregister_netdevice: waiting for bond3.651 to become free. Usage count = 2
2021-06-01T16:02:13+00:00 self kernel: [15786748.700452] unregister_netdevice: waiting for bond3.651 to become free. Usage count = 2
* Please note the bond in question is associated with the VDM.
This issue can occur under the following circumstances:
- The NAS server is the destination side of replication
- Packet reflect is enabled on the source side of replication
The destination NAS server can encounter problems when bringing up the IP reflect cache after coming out of a paused state.
During normal operations the destination NAS server will pause then play delta information associated with replication.
The NAS server will then un-pause and bring the interfaces back online and the associated IP reflect cache.
There is a chance that when the NAS server brings the interface back online it will hang and fail to respond to NFS client requests.
Please note this condition is intermittent. The NAS server can run for a set period of time then encounter the bug.
It is possible the NAS server can hit the issue right away or run for a long period of time before the bug is triggered.
This issue can manifest itself in many different ways.
It can cause the SP to panic.
Creating or delete interfaces on the destination NAS server can fail and leave the NAS server in a degraded state.