Lun ID 0 issue with clustered hosts

Question

Not sure if anyone else has run into this issue but we have seen this on our VMAX 40K and 10K's and only see the issue on Clustered Unix/Linux hosts.

Problem

Storage Group and Masking View are created with Unisphere and LUN ID's are automatically assigned. Host sees the luns and there is no issue. A change is made at some point for the luns and the lun that was assigned LUN ID 0 is removed from the host. Then a new lun is presented and gets assigned LUN ID 0 and the clustered host only sees it on one of the clustered servers and not all.

For our environment one of the hosts we have seen this on are our Oracle T4's(3 node cluster) that are Running Solaris 11.1. Masking View was created and had 4 luns provisioned LUN ID 0(200GB),1(500GB),2(500GB),& 3(500GB) were provisioned. The server team needed us to change the 200GB lun to a 250GB lun so we removed the 200GB and provisioned a new 250GB lun which automatically got assigned lun ID 0. The new 250GB lun can be seen on node 1 but not on node 2 or 3.

We have been using a workaround of placing a 3MB gatekeeper on these hosts and assigning lun 0 when we add it to the storage group but wanted to know why it is that we see this issue. I know that normally LUN 0 is for boot luns but I wasn't sure if this was the issue or not. And if it is why does it work the first time the Masking View is created but not after that?

cincystorage · Answer

Are you running powerpath? Whats power mt display dev=all show? Also you can make the LUN a different host id when you are adding it to the masking view.

symaccess -sid -name -type storage add devs -lun

if -lun is not specified, it will start at 0 and go up.

PedalHarder · Answer

When you say nodes 2 and 3 can't see lun 0, Where are the admins looking. There are various layers. Are they saying the cluster software does not see the lun? Have you checked that the HBA can see the lun?  Has a reboot been performed as a test to see if the lun 0 appears after a reboot?

sauravrohilla · Answer

When a Lun with Id 0 was removed from the MV and later on another LUN was added then its more than likely that the newly added LUN would get the least available id (it was 0 in your case). So this is a normal behavior.

BUT, i dont understand why the other 2 nodes do not see the newly added LUN?

Do you have a single clustered masking view with cascaded IGs (for all the nodes)? If yes, then problem is more likely to be on those two nodes..Perhaps a scan was not done successfully on those two nodes?

Or the HBAs were able to see it but Powerpath did not let the devices to come on the hosts?

Is it possible for you to repro the problem, run the emcgrabs on all three nodes and open a support ticket? Lets see what they say?

regards,

Saurabh

EMCAlum · Answer

We have tried rebooting before and it did not clear the issue up. No matter how we rescan on the other nodes the HBA's do not seem to see the new lun. Like I said this is only a problem when we remove LUN 0 and then provision another lun to that Masking View

EMCAlum · Answer

We are not running powerpath but native MPIO. And the servers that I have listed is just one example, we have also had this issue with our Xen Servers that are on Red Hat Enterprise 5.9

I will have to see if I can get with our System Admins to see if we can find a host that we can try and reproduce the error on. We have manually assigned a lun ID as well and may see if one of those hosts can be used for reproducing the issue so I can get the grabs ran and open a ticket for it.

As far as the set up it depends on the host. For the Xen Cluster it is a 2 node cluster and all 4 HBA's are in one IG. For the T4 Cluster we have a set up of Cascaded IG's and Cascaded SG's so that we can separate the Luns for each DB for troubleshooting issues.

johncampbell1 · Answer

Did you get anwhere with this one ....am interested as in similar position where need ing to introduce a new boot lun ....must be id 0 ... into existing MVs...whilst i comprehend the symcli bit that does the "add dev -lun 0" ...i'm wondering what the effect of this will be on the existing lun 0 device....or do they all simply get renumbered up again when the MV is created again with the additional boot dev.

JGroce213 · Answer

Hello,

Hope this finds you well.

We see this in our environment quite a bit.
It seems to be related to which/how the GateKeepers are assigned to the FA Ports.

Basically as we expanded our VMAX, gatekeepers on FA got a little out of sync. Thus when we make one host on PG 1-32 it has different number of GK than when we map to 33-64. Our work around is that we try to stay on the same group....meaning first build or expansion build PGs.

When ever we run native MPIO it seems to enumerate the GK, Thus if they Gate Keeper counts on the Port Groups are different the cluster nodes get upset.

Hope this helps.

johncampbell1 · Answer

Thank you for your interest. My issue is essentially what happens to any existing LUN id 0 when i attempt to add a new boot lun at id 0.

The original parent view is deleted.... i'm trying work out what happens to former id 0 when i attempt to introduce a new id of 0 into the unpresented sg

One way to find out i guess .....

JGroce213 · Answer

John,

Thank you so much for the clarity, sorry for my confusion
We see that too, when we migrate from Clariion LUN0 seems to always keep that LUN0 onto the VMAX (when we use open replicator for this)....then the former LUN0s move up the line (most of time just that one former LUN0 changes).

These two things are probably our biggest gotchas on clusters and native MPIO.

Hope this helps, have a great day.

VMAX

Lun ID 0 issue with clustered hosts

Was this post helpful?