Start a Conversation

Unsolved

G

1 Message

254

November 11th, 2023 07:21

RAID 5 and Precision 3660 Workstation

I am using a Precision 3660 workstation machine for a on demand/ periodic data storage and retrieval application.

There are 3 SSD's, 1TB each and software RAID 5 is configured in the machine. There are 3 partitions,

Recently, the OS on the running machine got crashed and I am unable to boot the system again.

Under the F12 -> Diagnostics menu, one drive is showing a status as FAILED and is placed in the non-RAID member section. The other two drives are healthy and showing as RAID members.

Having faced such scenarios multiple times, the usual process to bring the machine to life is by deleting the RAID 5 array, reinstalling the Windows (10 pro), use the system image to install, redesign missing things (e.g. data on the D and E partitions is not recovered in this scenario.).

The downtime is long and the business in turn is affected tremendously.

Someone had suggested to take a manual backup from these drives before deleting the RAID (5), but I am looking for a faster, more reliable storage architecture here.

Considering my scenario here, I have a few questions -

1. What is the best method to bring my system back (using the system image).

2. Pl suggest an alternative system design to achieve OS redundancy (If OS is crashed, there is a backup available), ease of rebuilding the RAID without data loss (in case of failure)

3. I want to understand if the cause of OS corruption is related to the software RAID 5 configuration in any way because the machine is strictly kept away from internet access, pen-drive access. Also, the operators hesitate to force shutting or starting of the machine.

My application is write intensive. Also, there was no hardware failure in this case or before.

Details of the RAID volume after failure -

4 Operator

 • 

1.1K Posts

November 20th, 2023 21:28

Hello,

why are you not simply rebuilding the array ?  aka remove the failed, put a new in place , rebuild from the option that should be present in that screenshot ?

The idea of raid 5 is to rebuild the whole array as long as just 1 disk fails...

If your application is write intensive, don't be too surprised that a nvme dies.... they have, depending on model/brand, a max amount of data that can be written before it gives up.... so, how much intensive are we speaking of , in TB of data written ?
And if really intensive, consider U.2 enterprise ssd units, those are made to endure real write loads

No Events found!

Top