This post is more than 5 years old
3 Posts
1
4820
SDS disconnected
We have Scaleio version 2.0.5014 installed and configured in 3-host mode on CentOS 7.
It worked fine for 2 month.
Today morning 2 of 3 SDS (#1 and #2) lost connection and stay "Disconnected".
Reboot is not helpfull.
The network is working without problems, ping passes between all hosts.
Test sds gives error:
[root@scaleio1 cfg]# scli --start_sds_network_test --sds_ip 192.168.201.24
Error: MDM failed command. Status: SDS is being configured. Please retry the command.
But "scli --query_sds --sds_ip 192.168.201.24" working:
[root@scaleio1 cfg]# scli --query_all_sds
Query-all-SDS returned 3 SDS nodes.
Protection Domain 4ed2eb6300000000 Name: Applico
SDS ID: a5361e3300000002 Name: SDS_192.168.201.24 State: Disconnected, Join-Pending IP: 192.168.201.24 Port: 7072 Version: 2.0.5014
SDS ID: a5361e3200000001 Name: SDS_192.168.201.14 State: Disconnected, Join-Pending IP: 192.168.201.14,192.168.202.12 Port: 7072 Version: 2.0.5014
SDS ID: a5361e3100000000 Name: SDS_192.168.201.34 State: Connected, Joined IP: 192.168.201.34,192.168.202.32 Port: 7072 Version: 2.0.5014
"scli --query_sds_connectivity_status --protection_domain_name Applico" hangs with no response.
What can we do to resolve this issue?
PhilipF1
3 Posts
0
June 17th, 2016 02:00
Please find below the root cause for this issue is that:
For 6TB disks there is memory allocation in the SDS is too small (the default value for tgt_mem__stmp_size_in_lbs is too small)
The solution:
tgt_mem__stmp_size_in_lbs=262144
important note: This operation must be done on all the SDSs associated with the 6TB disks and must be performed one-by-one (and not in parallel)
pawelw1
306 Posts
0
May 19th, 2016 07:00
Hi,
Are both disconnected SDS' reachable at all? Did their IPs change or was there any upgrade/maintenance done at that time?
Can you try to collect get_info information from both SDS (run /opt/emc/scaleio/sds/diag/get_info.sh) and the primary MDM
(log into ScaleIO with "scaleio --login, then run /opt/emc/scaleio/mdm/diag/get_info.sh) and upload the 3 archives to the following FTP location:
https://ftp.emc.com/action/login?domain=ftp.emc.com&username=nC0Qxon4B&password=FA5gFA5FAg
Thank you, Pawel
PhilipF1
3 Posts
0
May 19th, 2016 23:00
Hi,
The Both disconnected SDS' are reachable at all.
There are even records in the MDM log when we restart the SDS machine:
No IP changes, no maintenance or upgrade taken. Regular full-time job, copying data to storage.
We uploaded the reports to ftp.
Thank You for the support!