Start a Conversation

Unsolved

1 Rookie

 • 

20 Posts

2126

November 14th, 2021 11:00

Replace PowerFlex disk in ESXi environment

Hello everyone,

is there an official guide or procedure for replacing a failed disk of a PowerFlex installation in an ESXi environment?

KR,
Jacopo

14 Posts

November 23rd, 2021 15:00

I think ESX doesn't care or manage the disk in PowerFlex storage.  They are separate components.  You delete the bad disk from the PowerFlex GUI management interface, replace the disk, and add it back from the GUI.

45 Posts

November 29th, 2021 00:00

If the devices are managed/configured on the ESXi server using DirectPath then carry out the steps as you mentioned.
Check if Cloudlink is in use to encrypt the devices also.
You will need to do a few more steps if Cloudlink is in use.

DirectPath architecture is the recommended best practice method.

If you are using RDM mapping, a device is created on the SVM that points to the physical disk on the
ESXi. Therefore removing and adding a device has more steps.

But DirectPath is the recommended method and we strongly advise using DirectPath method.

When replacing the disk, make sure you correlate the serial number of the failed drive and the operating system path correctly.

One method to correlate the device path and serial number is to  enter the following command:
#udevadm info --query=all --name=/dev/sdX | grep SCSI
Where /dev/sdX is the device path of your failed disk.
The command output includes the serial number for the device path.

 

Then  log into iDRAC.
• In the navigation pane, select Storage >Overview, and then click Physical Disks.
iDRAC displays the list of hard drives that are installed on the PowerFlex node.
• Expand the + sign for the listed devices and search for the serial number of the failed device that
you noted in the command above.
The Advanced Properties displays the disk name of the failed device.

 

After the disk has been physically replaced  add the disk back to the PowerFlex cluster:

 Steps
1. Log back in to the PowerFlex GUI.
2. In the left pane, click Configuration >SDSs.
3. In the right pane, select the check box for the relevant SDS.
4. In the upper-right menu, click Add Device >Storage Device.
5. In the Add Storage Device to SDS dialog box, enter the relevant information:
o Path - /dev/X
o Name -
o Storage Pool - the name of the Storage Pool
o Ensure that the correct media type is selected.
6. Click Add Device >Add Devices.
7. In the pop-up window, click Dismiss.
8. Verify that the device was added correctly:
o In Dashboard view, check that the disk status is Rebalancing.
o Wait for rebuild process to finish.

I hope that helps.

14 Posts

November 29th, 2021 09:00

Using direct path is correct.  It eliminates the latency from the hypervisor layer.

I think the PowerFlex is very suck on handling the disks.  When adding the disk, it requires the user to enter the path of the device manually.  I think DELL should write the interface better by just presenting the available/unused disk by a drop-down menu instead of manually entering the path.  In order to figure out the path of a disk requires some skill in Linux file system.

I don't recommend using the path /dev/sdX for the disk path.  Doing that could be deadly when there is a disk failure.  Linux identifies the logical disk by the path /dev/sdX.  In general, it assigns sda to the first disk in the storage system.  sbd to the second disk, sdc to the third disk, and so on.  Assigning the disk to the PowerFlex by the path /dev/sdx is the easiest way.  However, it will introduce a huge problem when there is a disk failure.  For example, you have a storage with 5 disks.  You assign the path for each disk in PowerFlex like below,

Disk 1 - /dev/sda

Disk 2 - /dev/sdb

Disk 3 - /dev/sdc

Disk 4 - /dev/sdd

Disk 5 - /dev/sde

When everything is working, everthing is happy. 

However, there is a power failure on the storage system, the Disk 2 failed.  Disk failure during the power failure is very common.  If the failed disk is just unreadable but still detected by the disk controller, the problem could be minimum.   You just replace the disk with very little work.   However, if the failed disk becomes undetectable by the controller, you are going to have a huge problem because the Linux system is going to reassign a new logical path to each of the disks.   The disk path is going to be like this after the reboot.

Disk 1 - /dev/sda

Disk 2 - missing/failed

Disk 3 - /dev/ sdb

Disk 4 - /dev/sdc

Disk 5 - /dev/sdd

As you can see, because of the missing Disk 2, the Linux is going to shift the logical disk ID for disk 3,4,5 after the reboot.  The original Disk 3 - /dev/sdc become /dev/sdb.  If you use /dev/sdx for the disk path in PowerFlex, your disks are going to be messed up after the reboot.  You are going to find disk 2,3,4,5 all become offline.  To prevent this situation, I suggest using the path /dev/disk/by-id/ instead.  By using the disk ID as the path, the Linux will always write to the correct disk when there is a disk failure.  It can write to the correct disk even the disk has been moved to a different slot in the storage.  

 

 

 

1 Rookie

 • 

1 Message

April 29th, 2024 07:11

After removing the disk from PF GUI and physically replacing the disk, I couldn't add it back from the PF GUI.

Login to an SVM as admin and run below command:

"

#scli --mdm_ip XX.XX.XX.XX --username admin --query_sds --sds_ip XX.XX.XX.XX

"

Give the primary mdm ip and sds ip (the sds ip in which the disk was failed and removed) accordingly, both which can be found from the PF GUI.

You will get a detailed output and locate the info related to the disks.

"

Device information (total 28 devices):

         1: Name: ScaleIO-26ab9bde  Path: /dev/sdx  Original-path: /dev/sdw  ID: ************

                Storage Pool: ************, Capacity: 2233 GB, State: Normal

                Scanned 56572178 MB, Unresolved Read errors: 0/0, Unresolved Compare errors: 0/0

         2: Name: ScaleIO-26ab9bdf  Path: /dev/sdy  Original-path: /dev/sdx  ID: ************

                Storage Pool: ************, Capacity: 2233 GB, State: Normal

                Scanned 56577001 MB, Unresolved Read errors: 0/0, Unresolved Compare errors: 0/0

         3: Name: ScaleIO-26ab9be0  Path: /dev/sdz  Original-path: /dev/sdy  ID: ************

                Storage Pool: ************, Capacity: 2233 GB, State: Normal

                Scanned 56570......................................

"

You can see the "Path" and "Original path" are not the same for one device.

Run the below command to update the two paths to the same. You can run this command without any impact to the running server in production. No impact.

"

#scli --update_sds_original_paths --sds_id XXXXXXXXXXX

"

Run command "#scli --query_all_sds" to get all info regarding sds id, ip, name etc..

Then try adding the disk from PF GUI again.

Worked for me!!

No Events found!

Top