Start a Conversation

Unsolved

A

12 Posts

4863

August 16th, 2020 04:00

Dell VRTX Sperc 2 Failed, Disks not accessible Backplane 1 SAS_2A, SAS_2B cable is disconnected

Hello Folks,

I got the critical alerts on my Dell VRTX chassis listed below.

Chassis: The 12V storage voltage is outside of range

Chassis: The 12V FABB voltage is outside of range

Chassis: Power Supply redundancy is lost

perc-2 Backplane 1 SAS_2B cable is disconnected

perc-2 Backplane 1 SAS_2A cable is disconnected

perc-2 Chassis Integrated RAID 2 is no longer fault tolerant because the peer controller is not available.

Note: I have three blades M630 in my VRTX and none of them is connecting to Storage disks, I think due to failure of one perc controller, How can i enable the storage configuration to another perc controller which is available.

Appreciate your suggestions

 

Moderator

 • 

790 Posts

August 17th, 2020 01:00

Hi Arif.

 

You can check the option in the CMC menu, I can't see it from here.

Log in to the CMC and got to Storage --> Controllers there you will see Fault Tolerant Mode if it shows Active/Passive it means one is active the other one will step in when there is an issue.

 

I checked the CMC log but there is no indication for the PERC failure, even the one that you mentioned is not listed there.

 

What I see is this:

Thu Aug 13 2020 10:34:06 The 12V Storage voltage is outside of range.
Thu Aug 13 2020 10:34:05 The 12V FABB voltage is outside of range.
Thu Aug 13 2020 10:34:04 Power supply 2 failed.
Thu Aug 13 2020 10:34:01 Power supply 4 failed.
Thu Aug 13 2020 10:33:59 Power supply 3 failed.
Thu Aug 13 2020 10:33:59 Power supply 1 failed.

 

Was there a power outage or did someone power off the system around 10:33 and power back on at 11:13?

 

Except for these messages, all looks fine to me.

 

Reseating may solve the issue:

1. Open the system.
2. Check the storage controller card indicators. If the power indicator blinks irregularly or the attention indicator
blinks amber, it indicates a fault condition.
3. Turn off the system and attached peripherals, and disconnect the system from the electrical outlet.
4. Reseat the integrated storage controller card, SAS cables, and the storage controller battery.
5. If the storage controller functions properly, close the system, reconnect it to the electrical outlet, and turn the
system on.
6. If the storage controller does not function properly, log in to the CMC web interface and view the properties of the
storage controller.

 

You may also check the Log via the CMC WebGUI like this:

To view the hardware logs using CMC Web interface, in the left pane, click Chassis Overview → Logs.
The HardwareLog page is displayed.
To save a copy of the hardware log to your managed station or network, click Save Log, and then specify a location for a text file of the log.

 

After you have stored a copy on your system, please clear the log and check if the system goes back to normal mode.

To clear the hardware log, click Clear Log on the CMC HardwareLog page in the WebGUI.

 

Let me know the results.

 

Best regards,
Stefan

 

August 17th, 2020 01:00

Hi Stefan,

Thanks for the reply !

Yes i have exported the logs from CMC and sPERC cards here is the link ( https://drive.google.com/drive/folders/1IZlGhd_vTKQzH5oURkhISmCzpw7ZrlBR?usp=sharing )

How to check the policy ?

Can you please check which policy is active 1 or 2 and also check the CMC log if there was an issue before with the first PERC

Thanks,

Arif

Moderator

 • 

790 Posts

August 17th, 2020 01:00

Hi arifsohail2020.

 

Looks likethere was a issue with the power before the PERC stopped working. We should check this one too, but here is some information about the sPERC.

 

There are two options on how to use the controller.

 

Opt. 1: Dual Fault Tolerant Shared PERC 8 Card Configuration 

When the active controller stops functioning, the passive controller acts as a hot-spare and takes over the functions of the active controller. 

Opt. 2: Dual Non-Fault Tolerant Shared PERC 8 External Card Configuration 

Both controller cards are acting as single controller, no redundancy plan available.

 

The error messages you posted show that PERC-2 stopped working, so PERC-1 should take over. if the first option is active and the controller did not fail before.

 

Can you please check which policy is active 1 or 2 and also check the CMC log if there was an issue before with the first PERC.
Here is an article that shows how to export the file:

PowerEdge Server: How to generate enclosure logs for CMC/VRTX/FX2 (
https://dell.to/2Cz1ajj)

 

In addition, you can reseat the controller cards to ensure a good contact with the motherboard - but please first export the log files.

 

Let me know your research results.

 

Best regards,
Stefan

 

August 17th, 2020 02:00

Hi Stefan,

Yes there was power fluctuation last week.

I would do the reseat and also the firmware update for "Shared PERC 8; Backplane Expander Board and CMC"

I have got the logs from Putty session, Can you please review them https://drive.google.com/drive/folders/1IZlGhd_vTKQzH5oURkhISmCzpw7ZrlBR?usp=sharing

Thanks,

Arif

Moderator

 • 

790 Posts

August 17th, 2020 02:00

Hi Arif,

 

what happend to PSU2? - The 12V FABB voltage is outside of range.

 

I'm thinking of the power plan you have, I see that there are 534Watts input at the moment, what kind of PSUs are installed? What I mean is, that there is maybe not enough power available in the system to run all the components.

 

Next step, please make sure that all the firmware is up to date. https://dell.to/323aepi

 

Check for: Shared PERC 8; Backplane Expander Board and CMC.

 

 

 

August 17th, 2020 02:00

Hi Stefan,

Thanks for your reply !

Yes i would do the reseat of the controller, Can you check this link https://drive.google.com/drive/folders/1IZlGhd_vTKQzH5oURkhISmCzpw7ZrlBR?usp=sharing

I have attached images also from CMC where the controller and disks are missing.

Thanks,

Arif

Moderator

 • 

790 Posts

August 17th, 2020 03:00

Hey,


ok, then please do the steps as provided and let me know the changes.

Is there still warranty on this system? 

 

BR

Stefan

 

August 17th, 2020 03:00

Hi Stefan,

Is PERC-2 still listed under CMC -> Storage -> Controllers -> Troubleshooting? If yes, please Export the TTY Log for me.

-------- PERC-2 is not listed 


You may also check the assignment of VD to the server, maybe this is not set correctly.

In CMC go to Storage -> Virtual Disk -> Assign, you will see a list of your VDs and how the single assignments are set.

---------Virtual Disk are not listed

### As you suggested we have to reset the PERC-2 controllers and check the status on the CMC.

### Also we have to update the firmware's after the reseat.

Moderator

 • 

790 Posts

August 17th, 2020 03:00

Hi Arif,

 

thanks for all the log files. Power is out of the question, I see 4 1600Watt PSUs online in the power log. So you are good at this.

So we need to worry about the PERC-2. I haven't found any issue with PERC-1 this one should work - as your screenshot shows, it is green so it's healthy.

 

Is PERC-2 still listed under CMC -> Storage -> Controllers -> Troubleshooting? If yes, please Export the TTY Log for me.

 


You may also check the assignment of VD to the server, maybe this is not set correctly.

In CMC go to Storage -> Virtual Disk -> Assign, you will see a list of your VDs and how the single assignments are set.

 

August 17th, 2020 03:00

Hey Stefan,

I will check the options and let you know the changes.

Unfortunately warranty expired.

Thanks,

Arif

August 17th, 2020 04:00

Hey,

Yes i will let you know the changes.

Hardware is out of warranty.

Thanks,

Arif

Moderator

 • 

790 Posts

August 17th, 2020 05:00

Hi Arif,

 

I found two:

 

0HX53 ASSY,CRD,CTL,SPERC8-E,LPF Controller Card Assembly, SPERC 8 External (Low Profile)
P3WV4 ASSY,CRD,CTL,SPERC8,1GB,VRTX ASSEMBLY, Card, CTL, SPERC8, 1GB, VRTX

 

You should check the Dell Part Number (DPN) on the defective PERC.

 

And here is a list of steps to replace the controller.

 

 

Shared PERC8 Part Replacement Procedure
   1.

Note the firmware version of the current Shared PERC 8 Controller(s). If the system has 2 Shared PERC 8 Controllers, they must be running the same firmware. The firmware version can be found in the CMC GUI by clicking on Chassis Overview > Storage > Controllers, then expanding the controller properties section.

2.

Power off of all the server modules.

3.

Power off the PowerEdge VRTX system.

4.

Remove the server modules and the shared storage hard drives from the PowerEdge VRTX system. Label all server modules and hard drives before removal so that they can be replaced in the same slots later.

5.

Replace the defective Shared PERC 8 controller.

6.

Power on the PowerEdge VRTX system, WITHOUT hard drives and server modules inserted.

7.

Wait for the PowerEdge VRTX system to power on completely. This can be confirmed from the CMC GUI by clicking on Chassis Overview > Power > Control and view the Power State to ensure that it is ON.

8.

The PowerEdge VRTX storage subsystem may take up to 25 minutes to become online.

9.

Confirm the replacement of the Shared PERC 8 controller has the same version or firmware as the controller that was replaced. If the system has 2 Shared PERC 8 Controllers, they must be running the same firmware.

10.

If the old Shared PERC 8 Controller firmware version is unknown, the controller should be updated to the latest version on https://dell.to/346MNOB.

11.

If the system has two Shared PERC 8 Controllers, confirm the Fault Tolerance properties are healthy. This is necessary to ensure that any new firmware has been initialized and is compatible before you reinsert the shared hard drives and modular servers.

12.

Turn off the PowerEdge VRTX system.

13.

Insert the shared storage hard drives (that were removed earlier) into their original slots.

14.

Turn on the PowerEdge VRTX system.

15.

Confirm the Virtual Disk Layout and the Virtual Disk Assignments in the CMC GUI. If the virtual disks are not imported, not present, or the virtual disk assignments are not present or wrong, contact Dell Technical Support.

16.

Turn off the PowerEdge VRTX system.

17.

Insert the server modules (that were removed earlier) into their original slots.

18.

Turn on the PowerEdge VRTX system.

19.

Turn on the server modules.

 

Best regards,
Stefan

 

August 17th, 2020 05:00

Hey Stefan,

Can you suggest the part number of the PERC-2 Controller incase we have to replace it.

Thanks,

Arif

August 17th, 2020 05:00

Thank you Stefan..I found the below SPERC from my configuration under dell website.

8X25K ASSY,CRD,BKPLN,XPNDR,VRTX,V2 2
NGPXG INFO,ROYALTY,DUAL,SPERC8 0
NV6R0 ASSY,CBL,XPNDR,DUO,2.5,V2,VRTX 1
P3WV4 ASSY,CRD,CTL,SPERC8,1GB,VRTX 2
PF7J8 SRV,DRVR,SPERC8,VRTX,SAS-RAID 0

1 Rookie

 • 

21 Posts

August 19th, 2020 04:00

Your problem seems similar to mine with one VRTX of us in Spain ...

CMC Firmware is 3.30 ... after a firmware update of one of the blades one PERC was leaving fault tolerance mode ... after that somehow the system switched to the second PERC and my storage access was gone (no fault was logged in the hardware log).

I did a complete shutdown of the VRTX (was no problem because the ESX had no longer access to the storage ...)

After repowering one PERC still was gone ... did an reset failover to the second CMC ... Hurrah PERC was back again ... but still no storage viewable on the ESX ...

Solution was ... mapping of the virtual disk was somehow lost ... I did represent the virtual disk to the blades and still them ... all fine!

 

Caution: I saw that strange PERC behavior also on some of my other VRTX systems ... message is always .. PERC1 left fault tolerant group ... CMC failover and it is back again. That happend on systems when I did BIOS updates on the blades M640 and CLPD Update ... CMC was every time version 3.30.

With CMC 3.21 and earlier never had any issue

No Events found!

Top