Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

4206

February 25th, 2016 02:00

Problem on a AX4-5f

Hello,

My fiber storage array is equipped with 2 SPS, one for every SP. The SPS managing SPb broke down and I proceeded to its replacement.
Then, after reboot then entire array the SPb appears as defective and SPS connect to it has for state: empty

Is it the current status of SPS which returns the SPb defective ?
Is there any specific operation to do for that SPS is operational?

Thank you for your help

195 Posts

February 29th, 2016 11:00

As you are using them, the four vault disks have two distinct types of data on them:

> User data from the LUNs you defined there

> Flare/vault data that the system uses

That second type of data can only exist on disks in slot 0-3.  Think of them as the boot drives for the system; their use is identified by their physical location.  When your disks failed, and were rebuilt to the designated hot spares, only the first type of data was rebuilt. 

Your system hasn't been healthy since that first vault drive failed.  For instance, I believe that write cache has been disabled since that point in time.  Until you can return it to a healthy status I wouldn't expect it to function in a normal manner.

Having put disks back into slots 1 and 3, I would guess that it would be re-syncing from the invoked hot spares.  As these are SATA disks there is a chance that their issues were with internal drive recovery, and they may be essentially 'good'.  But I would think that the rebuild was still in progress, and that one or both of them may fail before it finishes.

What alerts do you still have?  Just the SPS, or do you still see issues with the SPs and write cache disabled?

On a small capacity system it is a bit of a luxury, but if possible, you should avoid using the vault disks for user LUNs.

As I read it, your current vault drives are 750GB SATA, and the other eight drives are 1TB SATA.  If disk 1 or 3 should fail you *could* unbind your remaining hot spare and physically replace the failed disk with it.  That is perhaps an imperfect solution, but only because historically the 1TB SATA disks have what is perhaps the worst failure rate of any disk made in this century.  Otherwise, replacing a like interface (SATA for SATA) vault with a larger capacity disk will not cause any issues, and you can go to wherever you get your replacement disks and buy another 1TB for just about the same cost as the 750GB.

4.5K Posts

February 25th, 2016 14:00

There should be some indicator lites on the SPS indicating the state of the battery. You should let the SPS fully charge until you get the green lite. Then ensure that you have the sense cable plugged to the correct SP - try re-seating it on both ends - check the other cable on the good SPS for comparison.

Once the battery is fully charged, then try re-booting the array.

glen

6 Posts

February 26th, 2016 03:00

Hello and thanks for your help.

I checked the connections on the both ends of the sense cable and all is correct.

Only the Active light is ON, with a green color. Others lights are OFF

This screenshot to show you the status of the storage array

status.JPG.jpg

and this one is the 'attention required' information

problem.JPG.jpg

At the beginning, it was a problem with two disks on my first disk pool ( disks 1 and 3). The system used spare disks 4 and 11 to maintain activity of the LUNs on this pool. After that, i had a problem with failed SPS, connected to SPB.

I bought 2 news disks and a new SPS, replace the failed SPS by the new one ( after stopping the SPB). I installed the new disks in slots 1 and 3. When i do this, i losted connexion with navisphere on the storage array, even after rebooting it.

The actual status is the same since i removed new disks and restart the array.

4.5K Posts

February 26th, 2016 09:00

When you have two disks fail at the same time and those two disks are in the same storage pool, then you get what's called a double faulted raid group (on the Ax series the Storage Pools are Raid Groups). When two disks (1 and 3) were faulted, those are part of the OS drives and control SPB (disks 1 and 3 are a mirror set). You need to put the disks back in in the exact location they came out of - as these are the OS disks there should be special labels on the disks.

When two disks in a mirror both fail, it's very unlikely that this can be fixed without extensive work - any user data on those disks may be lost and the disks will need to be rebuild (re-imaged) to restore the OS.

From what I can see only SPA is still alive and is probably the only SP that you could communicate with. The disk faults on SPB side caused SPB to panic, which also probably caused the SPS fault.

You'll probably need assistance from a Partner or EMC to help resolve this issue.

Make sure you keep the two original disk (1 and 3) separated from the other disks and make sure you know which slot which disk belongs to as this may help restore the SPB side faster.

glen

6 Posts

February 28th, 2016 23:00

Thanks for your help Glen !

The disks 1 et 3  failed not at the same type. They were successively replaced by disk 4 and 11. It takes long time for me to buy new disks and it's for this reason i have 2 disks missing in the first pool.

When i tried to add to news disks in slots 1 and 3 , i was thinking this two disks replace the spare disks 4 and 11 after rebuid. Maybe i must try this, one disk after the other and not two in the same time like i did it before.

6 Posts

February 29th, 2016 08:00

Thanks for your answer Zaphod.

My arrray was defined like this :

1st pool disk 0-3 (raid 5 with 4*687Go disks) ===> 2 LUNs

2nd pool disk 5-10 (raid 5 with 6*917Go disks) ===> 3 LUNs

Disk 4 (687Go) spare disk for pool 1

Disk 11 (917Go) spare disk for pool 2

First disk failed = disk 1 replaced by Disk 4, on week later, disk 3 failed replaced by disk 11.

Array works fine in this configuration until i buy new disks. When i replaced failed disk by new ones, i lost connection with navisphere .... Reading informations about this problem on EMC users FAQ, i try to removed new disks. With this, i can again access to the array via navisphere, in a state like the above screenshots.

This morning i installed the failed disks (1 and 3) in their original space and here is the actual state of the array...

Capture_ax4.JPG.jpg

As you can see i could redefine disk 11 as a spare disk of the second pool.

On the other side, impossible for me to recreate 1st pool with disk 0 to 3 ...

Navisphere answers  :The creation of the disk pool failed:    Error reported by storage processor A:Peer SP will not allow creation of Disk Pool. It may be performing an operation on one of the requested disks. SP B: Bad FRU Configuration in RAID Group create.

What can i do now ? The supposed failed disks 1 and  3 are good or not ? and why my SPS B is faulted while it's a new one i installed ...

195 Posts

February 29th, 2016 08:00

Disks 0-3 in the base enclosure are the vault drives where the code running the SPs live.  It is an extremely poor idea to leave even one of them failed for a second longer than absolutely necessary.

If you intend to continue using this storage (...and if you can ever get it healthy again...) you should purchase at least one spare drive to have on hand; when you use that one start the process of obtaining the next one immediately.

6 Posts

March 1st, 2016 00:00

Thank you Zaphod , i understand now how storage array works.

array state is today the same than yesterday, but few minuets ago, i tried again to create a disk pool with disks 0-3 and it works !!! I defined disk 4 as a spare disk for this pool and actually system initializing the two LUNs i created in this pool.

But write cache already disable because SPSb staying faulted. What i don(t understand is that SPS is a new one and it never works fine or be charged since i installed it.

Is there any procedure to initialize this SPS or forcing it to charge ?

Last screenshots status below

Capture_disks.JPG.jpg

Capture_newax4.JPG.jpg

Capture_luns.JPG.jpg

195 Posts

March 1st, 2016 06:00

When swapping batteries I have run into occasions where a 'new' SPS did not come up properly.

What do the lights on the SPS tell you?  I have had occasions where the unit was all green, and the SPs didn't agree, and others when the unit itself had an amber light on it.

You should check the cabling for the sense cable.

It is possible that the cable can be bad, but I consider that unlikely in general unless it was physically traumatized (kinked, bent pins on the connectors etc.).

You can power cycle the SPS, as long as the other is in good shape.  unplug it and leave it off for a minute or two, then plug it back in.

I have had occasions where a reboot of one, or both (one at a time of course), of the SPs was required to get things back in shape.

6 Posts

March 15th, 2016 02:00

Hello Zaphod  and sorry for the late answer.

I tried many times to force SPS restart, by stopping SPB before this. SPS still stay in the same state. Green led ON at the back of the SPS and faulted status in Navisphere. I checked the cable sense, nothing seems to be bad.

The only thing i have not tried yet is to plug the original SPS. I try today .

4.5K Posts

March 15th, 2016 12:00

There are two things you can try:

1. Try rebooting SPB - sometimes the interface for the sense cable could be locked up internally - a reboot would release the sense cable interface (RS232)

2. Swap the sense cable on SPA to SPB

glen

No Events found!

Top