All right, we’re going to look at Unisphere and see if we could determine types of faults, take you through the scenarios that we might see. So, at this point I’ve logged on to Unisphere. I’ve just simply supplied an IP address here. I'm going to click on the little house here and that's going to bring me back to the main landing page, what we refer to as the dashboard. From the dashboard I see the system.
There’s only one system in this particular configuration. Had there been more than one, I could have gone up here and checked to see that. So, I’m going to double-click on that system and then I’m going to take a look around at some of the system hardware. So, from the system main menu I can go ahead and look at storage hardware. And I can see in this case, it brings up the hardware on the left-hand side with a picture of the array in question here.
So, if I were to click on that, I see the bus enclosure 0, which in our case is a disk processor enclosure. In this particular instance it happens to be a VNX 5600. Note that I can expand each one of these trees out; I refer to this as the storage hardware tree. And by doing so I can go ahead and click on the plus sign and then that will in turn give me other objects I can expand on.
I’m going to do this for the disk because I see there’s a fault here on this bus enclosure 1. And if I were to click on that I can see a pictorial of that enclosure. And so, I’m going down to see if I can drill a little farther to see what the problem is. So I have a disk that show up is as good and unbound in these cases. And I see the fault right here on this particular disk. It’s disk one, underscore one, underscore 17, and that happens to showing up as a “remove disk”.
Try to click on that and look at the properties, I can right-click on that, to view the properties. I can see that the disk is in the state of removed. While I had that disk in there originally, and it was spun up and powered up. So there seems to be an issue here on that particular disk. So, I say “OK” to that note, also that the disk shows up as red, had the disk been a good disk, for example, 16, it would show up as green.
So, I know a couple things here. I know it’s in the DAE, the disk array enclosure 1 and I know its location within that disk array enclosure. If I was in a large configuration, of course, I could still right-click on here and select the properties. And from that I should be able to see additional things here as well. Okay, so I’ve noticed I’ve got an error in the storage hardware section. If I go back up to the system and I look at system for hardware for file, I see all this looks pretty good as indicated by the green checkmark.
So, now I can look to see verify if there’s any other reports that are giving me the same issue as I’ve seen with the disk, so, I could do that a number of places. I’ll start here with the monitoring section, you know, click on the false status report. And I can see again it just tells me what I already knew from looking at the hardware tree that bus 1 enclosure 1 is faulted and the disk is removed.
Now, errors are also logged in the storage processor event log. So, if I were to go to System and under Monitoring alerts, look at the SP event logs, I’ll do A in this case and I’ll see if I can again, verify the same things that I was getting out the other end. I see anything indicated by a red is typically an error. I have “I” for informational errors, I have “X” for critical errors and I have yellows for not so critical, or warnings in this case.
So, depending on the severity, is going to depend on the type of icon that you see. So, I’m going to take a look at this because I see that it has to do with a bus. So, again, I can right-click, I can double-click, I should say, and I can get information on that. And again, it tells me that it’s faulted. It happens to be faulted on the 12th of July 2013; that’s when the event... well, actually, that’s when the event was logged.
So, I’m going to say “OK” to that and if I do the same thing on the event line 49, I can see the bus enclosure is faulted. So, I can look at it there as well. So, that sort of proves what I know about the disk . If I were to look at storage, well, there’s no storage pool set, so I can also look at the hardware section of this and look at disk. And I’m looking for disk 17, 1 1 17 and my bus enclosure 0 0 0, so I’m going to have to come down a bit until I find the enclosure 1. And just coming up here and bus enclosure 1... oh, I should say bus 1 enclosure 0, and I’m looking for bus 1, enclosure 1, disk 17 and here’s the disk here. It shows up as “no capacity” because it is removed.
So, at this point I’m pretty confident that I need to replace that disk. And so, there’s a few places I could do that. I could do it with USM, which is what we’re going to use here. But to launch USM, I can do that from a couple of places. I could do it here from replace disk, or I could come over and replace a faulted disk under the service task. In either case, it’s going to launch a USM, so I’m going to do that here. What it will do is, for the launch from the hardware replacement utility within USM and replace faulted disk, and I’m going to go ahead and select that.
That is going to launch the wizard. So, you can go ahead and read this, but basically it’s going to determine if there are any faulted disks within the system that are a candidate for replacement. So, I’ll select “Next” do that. It will go through an analysis of the system and check everything here. Now, you see here that it’s done several checks, one of them, which was identify a replacement candidate with a green check.
That to me, indicates that I have to, or I do have a disk that needs replacement. It’s identified a placement candidate. So, the analysis, I’m going to go ahead and select “check”. So, what I can see from this summary is the analysis completed, the disk drive faults were verified in the storage system as a candidate for a drive replacement. The drive that is the candidate is shown here. So, one disk is detected that is supported by the wizard, bus 1 1 17.
It gives me the part number, so at this point, I can select “Next”. I can see from this picture now that the replacement drive is showing up in its respective spot within the disk array enclosure. And if I had again several enclosures in there or a large configuration, I could go ahead and turn on the enclosure LEDs by selecting this tab. That would blink all the LEDs within the enclosure amber; it would allow one to look at a large configuration determined specifically what enclosure that you are trying to work on. Well, from this point I can continue, I need to generate the replacement instructions, and I’m going to do that now by selecting “Replacement instructions”.
That launches how to replace the disk module, so in this case, this is going to take me through all the procedures to do that. It gives you specific guidelines on removing the disk module. It gives you specific guidelines on attaching the ESD wrist band. This is important for any component, not just disk or any component within the storage system. And it comes on down to show me how to remove the front covers and do everything that is applicable to that replacement. So, I can go here to do that. So, I can follow that, I can look at the table of contents for that. And so at this point, I need to manually go ahead and replace that disk before I can move on.
Now, the other things you’re going to do, of course, is once you go in and you start replacing the disk, you would follow the procedure. If you’re in a unified system, that typically requires you to disable the connect home and email services that are started within that system. And then once you’ve replaced the component and you verified its functionality, you can go ahead and enable the connect home and email as well. I’m going to close that out here. I’ll close the tab. And at this point, I need to perform the operation.
So, you can see here that the disk has been replaced as evidenced by its status of a green check in the location within the disk array enclosure itself has turned green. So, in this case it happened to be a DAE that consists of 25 slots, and at this point I’m confident that it’s fixed. I’m going to go ahead and select “Next”.
That’s going to tell me see some of the summaries of the completion operation. And then I can go ahead and click “Next” to do that and then you're going to have several screens that are going to ask you to perform some activities in terms of who did the repair and so forth, and you would provide that information depending on your site. I'm going to just going to say “uncheck” that for the purposes of this and cancel out.
And at that point, we can go ahead and see if it’s okay. And we had a fault showing before, so I’m going to go back to verify that those have gone away by launching the system. I need to exit USM at this point so I’m going to close it out. Close that application. And I can see down here, I’m going to try to refresh and see if it has come to life, and you can see it has. Even though it’s unbound, it now shows a capacity and that's what we're looking for so it no longer shows it as removed. So, I’m pretty confident it has been fixed and the new disk is correctly in place.
If I went to “System” and I go to “Hardware” I can go to “Storage hardware”. I can see the fault on the DAE that had it previously to the replacement has now gone away and I would expand that, and just verify that that disk 17 shows, and it does show up, it shows up as “unbound”. It doesn’t have any... in other words it’s just not configured in any kind of pool or raid group at this point.
And I look at it and it shows the status of green, so at this point, I can go ahead and enable connect home and email and that will complete the operation. So, that was just a little scenario as how to determine faults with Unisphere and then in the case of a disk fault, use USM to go ahead and fix that particular fault condition, and then finally, verify that condition.