Start a Conversation

Unsolved

Closed

M

2 Posts

355

July 26th, 2023 15:00

PS6000 won't initialize / failed disks

Hello everyone.  Earlier this week we had to move our PS6000 to a different site.  After turning the device back on, we can't get it to fully boot.  We're faced with the "CLI >" prompt versus the normal "MEMBERNAME >" prompt, and whenever we try to enter most commands, we see "The storage array is still initializing. Limited commands will be available until the initialization is complete. Please try again later.”  We can't get past this.

We ran "support exec raidtool" and got this output:

 

Driver Status: Ok

RAID LUN 0 Faulted Beyond Recovery.

 15 Drives (0,f,4,6,8,10,12,?,3,5,7,9,11,15,f)

 RAID 6 (64KB sectPerSU)

 Capacity 12,482,512,879,616 bytes

Available Drives List: 1

Unavailable Drives:

2 (history of failure)

13 (history of failure)

14 (history of failure)

 

Also on initial boot, we get the following errors:

 

SP:1690411046.92:cache_driver.cc:1072:INFO:28.2.39:Active control module cache set to write-back mode.

SP:1690411049.07:ses.c:2554:WARNING:28.3.85:  Power cooling module 1 is turned off or not receiving power.

SP:1690411049.12:emm.c:2226:WARNING:28.3.51:Warning health conditions currently exist.

       Correct these conditions before they affect array operation.

       Power supply failure.

       There are 1 outstanding health conditions. Correct these conditions before they affect array operation.

SP:1690411049.24:emm.c:1288:INFO:28.2.6:Enclosure serial number: SHU0935411J1F84.

SP:1690411055.82:pclass.c:10108:WARNING:32.3.28:The drive at enclosure 0, drive 1 is not supported on this array. Contact your array support provider and provide the model number (ST31000528AS) and serial number (            5VP5MD02).

SP:1690411055.82:emm.c:2226:WARNING:28.3.51:Warning health conditions currently exist.

       Correct these conditions before they affect array operation.

       Power supply failure.

       An unauthorized drive has been detected and disabled.

       There are 2 outstanding health conditions. Correct these conditions before they affect array operation.

SP:1690411056.03:init.c:1003:WARNING:13.3.1:Drive 2 has a history of failure.

SP:1690411056.04:init.c:1003:WARNING:13.3.1:Drive 13 has a history of failure.

SP:1690411056.04:init.c:1003:WARNING:13.3.1:Drive 14 has a history of failure.

SP:1690411056.06:events.c:243:ERROR:14.4.21:0:The RAID LUN 0 is faulted beyond recovery.

SP:1690411056.06:emm.c:2226:ERROR:28.4.47:Critical health conditions exist.

       Correct immediately before they affect array operation.

       Drive array no longer functional due to multiple drive failures.

       There are 1 outstanding health conditions. Correct these conditions before they affect array operation.

SP:1690411056.46:mirror.c:2717:WARNING:13.3.29:Drive 1 has generated a SMART trip event, and will need to be replaced soon.

SP:1690411056.46:mirror.c:2717:WARNING:13.3.29:Drive 2 has generated a SMART trip event, and will need to be replaced soon.

SP:1690411056.46:mirror.c:2717:WARNING:13.3.29:Drive 8 has generated a SMART trip event, and will need to be replaced soon.

SP:1690411056.46:mirror.c:2717:WARNING:13.3.29:Drive 13 has generated a SMART trip event, and will need to be replaced soon.

SP:1690411056.46:mirror.c:2717:WARNING:13.3.29:Drive 14 has generated a SMART trip event, and will need to be replaced soon.

 

The power supply error is because we only had one PSU up at the time while troubleshooting, and the unauthorized disk error is because we tried to replace another failed drive with an new drive (not Dell EQL, though) to see if it would recover.

The messages read pretty doom and gloom, but is there any chance we can get this array back up and running?  Replace the failed drives with new Dells and get them rebuilt?

 

Thanks for the help!

Moderator

 • 

631 Posts

July 26th, 2023 23:00

Hi @modernferris,

 

From the error, (0,f,4,6,8,10,12,?,3,5,7,9,11,15,f), you have 2 failed drive and 1 unknown status drive. Hence, RAID 6 might be able to rebuild. Yes, for EqualLogic, you will need to replace compatible EQL drives. Currently, you have 3 bad status drive in the RAID, hence this would lead to data lost. You may need to try to revive the RAID container by inserting back the replaced drive. If data is accessible, try to replace a compatible EQL drive with 1 of the failed drive which is stated "f" in the list. Do not replace the drive which was initially replaced at the first place, as it is needed for the RAID to try to rebuild back. 

July 27th, 2023 01:00

Thanks for the reply!

I think the log messages were a bit confusing, because the EqualLogic wasn't in the original condition it was in when we first noticed the problem (i.e. both power supplies connected and all original drives in place).  I put the PS6000 back into the original config, and here's what we're getting.  Maybe this will make things a bit more clear for troubleshooting:

SP:1690472890.47:init.c:1003:WARNING:13.3.1:Drive 2 has a history of failure.
SP:1690472890.48:init.c:1003:WARNING:13.3.1:Drive 13 has a history of failure.
SP:1690472890.48:init.c:1003:WARNING:13.3.1:Drive 14 has a history of failure.
SP:1690472890.73:emm.c:2226:ERROR:28.4.47:Critical health conditions exist.
Correct immediately before they affect array operation.
Unable to recover battery-backed cache. Array will not initialize without intervention.
There are 1 outstanding health conditions. Correct these conditions before they affect array operation.
SP:1690472891.88:mirror.c:2717:WARNING:13.3.29:Drive 2 has generated a SMART tri p event, and will need to be replaced soon.
SP:1690472891.88:mirror.c:2717:WARNING:13.3.29:Drive 8 has generated a SMART tri p event, and will need to be replaced soon.
SP:1690472891.88:mirror.c:2717:WARNING:13.3.29:Drive 13 has generated a SMART tr ip event, and will need to be replaced soon.
SP:1690472891.88:mirror.c:2717:WARNING:13.3.29:Drive 14 has generated a SMART tr ip event, and will need to be replaced soon.

CLI> support exec "raidtool"
You are running a support command, which is normally restricted to PS Series Tec
hnical Support personnel. Do not use a support command without instruction from
Technical Support.
Driver Status: *Admin Intervention Requested*
RAID LUN 0 Degraded.
raid status unrecoverable.
15 Drives (0,f,4,6,8,10,12,1,3,5,7,9,11,15,f)
RAID 6 (64KB sectPerSU)
Capacity 12,482,512,879,616 bytes
Unavailable Drives:
2 (history of failure)
13 (history of failure)
14 (history of failure)

Again, this is the original config the PS6000 was in when we first noticed the problem.  All original drives are in place, and both PSUs are connected. The PS6000 still won't initialize, and we still can't access the data. Also to note, the raidtool message says there are only 15 drives, but we have 16 installed.  All 1TB EQL drives.

Thoughts on being able to recover this?

Thanks!

3 Apprentice

 • 

1.5K Posts

July 27th, 2023 02:00

Hello, 

  Re: CLI.  Are you connected to the serial port?  It sounds like you are connected to the passive controller.

 Re: 15 drives. One was a spare, that is not counted in the drives included in a RAIDset.  A spare does not have a label indicating what RAIDset it belongs to.  When a drive fails then it gets a label and is no longer a spare. 

  Regards, 

Don 

Moderator

 • 

631 Posts

July 27th, 2023 02:00

Hi @modernferris,

 

There is a possibility that the missing disk is 13. It could be because it is in foreign state or missing state, which RaidTool would not be able to view. Hence, if this is the case, you have 3 unhealthy state drive in your RAID. Initially was disk 1 which was "?", and was recovered but now still you have disk 2, 13 and 14. 

 

My suggestion is to contact support and create a support case for the L2 to check on your controller's CLI, and confirm the state of all 3 disk. The commands are only available when the L2 support are connected to your storage. 

No Events found!

Top