MDM event logs showing the frequent disconnect and reconnect of the MDM component:
2023-xx-xx 00:00:21.316 MDM_CLUSTER_LOST_CONNECTION WARNING The MDM, <MDM_Name> (ID <MDM_ID>), has lost connection to the cluster. 2023-xx-xx 00:00:21.419 MDM_CLUSTER_CONNECTED INFO The MDM, <MDM_Name> (ID <MDM_ID>), connected after 100ms 2023-xx-xx 00:00:23.480 MDM_CLUSTER_LOST_CONNECTION WARNING The MDM, <MDM_Name> (ID <MDM_ID>), has lost connection to the cluster. 2023-xx-xx 00:00:23.584 MDM_CLUSTER_CONNECTED INFO The MDM, <MDM_Name> (ID <MDM_ID>), connected after 110ms
Sar output from the MDM server that's disconnecting showing high TCP retransmissions:
sar -n ETCP 1 -t -f sar.0 atmptf/s estres/s retrans/s isegerr/s orsts/s 00:00:27 AM 0.00 0.00 62.00 0.00 0.00 00:00:28 AM 0.00 0.00 88.12 0.00 0.00 00:00:29 AM 0.00 3.00 100.00 0.00 0.00 00:00:30 AM 0.00 0.00 71.29 0.00 0.00 00:00:31 AM 0.00 0.00 71.00 0.00 0.00 ... 00:01:02 AM 0.00 0.00 48.51 0.00 0.00 00:01:03 AM 0.00 0.00 15.00 0.00 0.00 00:01:04 AM 0.00 0.00 207.00 0.00 0.00 00:01:05 AM 0.00 0.00 36.00 0.00 0.00 00:01:06 AM 0.00 0.99 105.94 0.00 0.00
Brief MDM Cluster degraded events
Performance degradation
The MDM server was patched and the Linux kernel was upgraded from 3.x to 5.x. This kernel upgrade changes many of the default OS parameters to different values. In this case, the TCP parameter "net.ipv4.tcp_fack" was disabled, among others, but this one seemed to have caused the high TCP retransmissions.
The SDS RPM provides a configuration file called emc.conf in the /opt/emc/scaleio/sds/cfg/ directory. This file includes many recommended OS parameters from Dell EMC.
If this is a PowerFlex Rack / Appliance environment, PowerFlex Manager will automatically copy the emc.conf file from "/opt/emc/scaleio/sds/cfg" to each of the server's systcl.conf and apply it. This will only happen on the initial node deployment. There is the possibility the sysctl.conf was not updated properly. If the sysctl.conf file does not exist with the correct values, after a kernel upgrade to 5.x, it's possible that some important parameters will change.
In a PowerFlex Rack / Appliance environment, if the sysctl.conf doesn't include all the parameters that the emc.conf has, it is recommended to copy over the emc.conf into each server /etc/sysctl.conf file. To apply the changes on the server. The server could either be rebooted or the command "sysctl -p" can be run to apply the changes from /etc/sysctl.conf. Ensure that proper maintenance best practices are done when making these changes.
In a Software Only environment, Dell EMC recommends these Linux parameters be applied to each of the servers, but ultimately, it's up to the business. Please consult with the OS vendor for best practices or if there are any questions.
All PowerFlex versions