Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

4820

May 18th, 2016 03:00

SDS disconnected

We have Scaleio version 2.0.5014 installed and configured in 3-host mode on CentOS 7.

It worked fine for 2 month.

Today morning 2 of 3 SDS (#1 and #2) lost connection and stay "Disconnected".

Reboot is not helpfull.

The network is working without problems, ping passes between all hosts.

Test sds gives error:

[root@scaleio1 cfg]# scli --start_sds_network_test --sds_ip 192.168.201.24

Error: MDM failed command.  Status: SDS is being configured. Please retry the command.

But "scli --query_sds --sds_ip 192.168.201.24" working:

[root@scaleio1 cfg]# scli --query_all_sds

Query-all-SDS returned 3 SDS nodes.

Protection Domain 4ed2eb6300000000 Name: Applico

SDS ID: a5361e3300000002 Name: SDS_192.168.201.24 State: Disconnected, Join-Pending IP: 192.168.201.24 Port: 7072 Version: 2.0.5014

SDS ID: a5361e3200000001 Name: SDS_192.168.201.14 State: Disconnected, Join-Pending IP: 192.168.201.14,192.168.202.12 Port: 7072 Version: 2.0.5014

SDS ID: a5361e3100000000 Name: SDS_192.168.201.34 State: Connected, Joined IP: 192.168.201.34,192.168.202.32 Port: 7072 Version: 2.0.5014

"scli --query_sds_connectivity_status --protection_domain_name Applico" hangs with no response.


What can we do to resolve this issue?

3 Posts

June 17th, 2016 02:00

Please find below the root cause for this issue is that:

For 6TB disks there is memory allocation in the SDS is too small (the default value for tgt_mem__stmp_size_in_lbs is too small)

The solution:

  1. 1. Add the line below to the conf.txt in the SDS

tgt_mem__stmp_size_in_lbs=262144

  1. 2. Restart the SDS

important note: This operation must be done on all the SDSs associated with the 6TB disks and must be performed one-by-one (and not in parallel)

306 Posts

May 19th, 2016 07:00

Hi,

Are both disconnected SDS' reachable at all? Did their IPs change or was there any upgrade/maintenance done at that time?

Can you try to collect get_info information from both SDS (run /opt/emc/scaleio/sds/diag/get_info.sh) and the primary MDM

(log into ScaleIO with "scaleio --login, then run /opt/emc/scaleio/mdm/diag/get_info.sh) and upload the 3 archives to the following FTP location:

https://ftp.emc.com/action/login?domain=ftp.emc.com&username=nC0Qxon4B&password=FA5gFA5FAg

Thank you, Pawel

3 Posts

May 19th, 2016 23:00

Hi,

The Both disconnected SDS' are reachable at all.

There are even records in the MDM log when we restart the SDS machine:

3304  2016-05-19 13:59:08.412 SDS_RECONNECTED       INFO         SDS: SDS_192.168.201.24 (ID a5361e3300000002)

No IP changes, no maintenance or upgrade taken. Regular full-time job, copying data to storage.


We uploaded the reports to ftp.


Thank You for the support!

No Events found!

Top