This post is more than 5 years old
25 Posts
0
9627
July 10th, 2011 13:00
MD3220i and Volume Lock Contention when ESX 4.0 and ESXi 4.1 Hosts Share Volumes?
Okay ... fresh off a weekend altering heart stopper so please bare with me if I ramble a bit :emotion-7:
We recently had to add a MD3220i to our network as we were running out of disk space on our MD300i and despite weeks of working with Dell Tech support, we were unable to successfully add a MD1000 expansion shelf. Long story short, the MD3000i has issues and we are replacing with the MD3220.
We placed the MD3220i into production on Friday afternoon (I know, I know .... never do a major upgrade on Friday ..) and it seemed to be working well with our vSphere ESXi 4.1 host server. The ESX 4.0 host server however was another matter altogether. Strange behavior, inconsistent connections, unreliable data writes, etc. After some trial and error and a nice 2 hour session with VMWare support, we uncovered the fact that the MD3220i DOES NOT support ESX 4.0.
Okay ... so we begin the arduous task of migrating the VMs to the ESXi 4.1 host so that we can then migrate the data from the MD3000i SAN to the new MD3220i SAN (since the ESX 4.0 server can't reliably read/write to/from the new MD3220i, we had to migrate vms to the ESXi 4.1 host first). This seemed to go alright, except performance was noticably slow, then got progressively slower until this morning when everything seemed to just "stop". No vCenter server available, vms sometimes available, sometimes not, extremely slow performance when you could connect to them, etc, etc.
Another 3 hour support session with VMWare tech support and the tech guy uncovers errors relating to locked LUNs on the MD3220i. One of us on the con call comes up with what turns out to be a brilliant idea; shut down the ESX 4.0 host, since it is the only possible system that could be locking the MD3220i LUNs. Even though the ESX 4.0 host is properly configured as a host group member on the MD3220i and has no active vms with data traffic on any of the SANS, it still had the iSCSI Initiator and port groups active for access to the MD3220i. Sure enough, as soon as the ESX 4.0 host powers off, everything springs back to life on the ESXi 4.1 host and the lock errors cease.
So here is my question (finally!): Is this problem strictly the result of the unsupported ESX 4.0 host having access to the MD3220i or is there perhaps a bug with the MD3220i that does not properly support multiple host access? I'm just asking as I'd like to know if this is a known issue before I rebuild the old ESX 4.0 host asa a ESXi 4.1 host and bring it online.
If anyone has any additional insight into this, it would be great to hear from you.
Dev Mgr
4 Operator
4 Operator
•
9.3K Posts
1
July 11th, 2011 07:00
Your iSCSI subnetting is a possible cause here.
Each port that you use on a controller needs a unique subnet from the other ports on that controller. The mirror port (on the 2nd controller) needs to match the subnet.
I suggest you re-do your whole subnetting to match best practice and then see what it does for you.
If you're short on subnets, I'd suggest something like this:
SAN:
Controller 0 iSCSI port 0: 192.168.230.101 (with a 255.255.255.128 subnetmask)
Controller 0 iSCSI port 0: 192.168.231.101 (with a 255.255.255.128 subnetmask)
Controller 0 iSCSI port 0: 192.168.230.201 (with a 255.255.255.128 subnetmask)
Controller 0 iSCSI port 0: 192.168.231.201 (with a 255.255.255.128 subnetmask)
Controller 0 iSCSI port 0: 192.168.230.102 (with a 255.255.255.128 subnetmask)
Controller 0 iSCSI port 0: 192.168.231.102 (with a 255.255.255.128 subnetmask)
Controller 0 iSCSI port 0: 192.168.230.202 (with a 255.255.255.128 subnetmask)
Controller 0 iSCSI port 0: 192.168.231.202 (with a 255.255.255.128 subnetmask)
Server:
vSwitch1: vmk1: iSCSI01: 192.168.230.130 (connecting to 0-0 and 1-0) (again with 25 bit subnetmask)
vSwitch2: vmk2: iSCSI02: 192.168.231.130 (connecting to 0-1 and 1-1) (again with 25 bit subnetmask)
vSwitch3: vmk3: iSCSI03: 192.168.230.230 (connecting to 0-2 and 1-2) (again with 25 bit subnetmask)
vSwitch4: vmk4: iSCSI04: 192.168.231.230 (connecting to 0-3 and 1-3) (again with 25 bit subnetmask)
This is basically extending this Dell article on how to set up an MD3000i with ESX 4.0 for usage with an MD3200i.
Dev Mgr
4 Operator
4 Operator
•
9.3K Posts
0
July 10th, 2011 18:00
As I always check the support matrix of a SAN before attaching a host to make sure the OS is supported (or which patch/service pack is needed), I haven't ever tried ESX 4.0 to an MD3200-series system.
When you got to the part where you realized that 4.0 wasn't supported with this SAN, my first thought was; so you upgraded to 4.1... it would either fix your issue, or allow VMware support to keep looking for other causes.
One note though; I assume you are using different subnets on each of the ports on a controller? Kind of like the MD3000i's recommendations.
E.g.:
Controller 0 iSCSI port 0: 192.168.130.101
Controller 0 iSCSI port 0: 192.168.131.101
Controller 0 iSCSI port 0: 192.168.132.101
Controller 0 iSCSI port 0: 192.168.133.101
Controller 0 iSCSI port 0: 192.168.130.102
Controller 0 iSCSI port 0: 192.168.131.102
Controller 0 iSCSI port 0: 192.168.132.102
Controller 0 iSCSI port 0: 192.168.133.102
And not reduced to a single or 2 subnets?
manofbronze
25 Posts
0
July 10th, 2011 20:00
Normally, I do check compatibility; however in this case, since DELL was providing the unit as a replacement for the defective MD3000i, and they knew we were using ESX 4.0, I mistakenly assumed the product was already vetted. Also, sometimes when you are in high speed reactionary mode you over look a few things ...
I am upgrading the ESX 4.0 to ESXi 4.1. Since this requires a rebuild of the ESX 4.0 system, I first needed to migrate the virtual machines to the existing ESXi 4.1 hosts. It was during this process, which requires both hosts to be active, that we encountered the problems.
Actually, we are not using a different subnet for each port. We have adifferent subnet for each controller and each port has a unique address. This is a configuration we have used many times in the past with the MD3200 series (per Dell assisted installs) and, until now, have never found it to be an issue. Our port config look like this:
Controller 0 iSCSI port 0: 192.168.230.101
Controller 0 iSCSI port 1: 192.168.230.102
Controller 0 iSCSI port 2: 192.168.230.103
Controller 0 iSCSI port 3: 192.168.230.104
Controller 1 iSCSI port 0: 192.168.231.101
Controller 1 iSCSI port 1: 192.168.231.102
Controller 1 iSCSI port 2: 192.168.231.103
Controller 1 iSCSI port 3: 192.168.231.104
We have (4) vmnics assigned to our iSCSI vSwitch with(8) vmkernel ports, each bound to a single nic. The config is like so:
Port "iSCSI01" vmk1 IP: 192.168.230.230 vmnic1
Port "iSCSI02" vmk1 IP: 192.168.231.230 vmnic2
Port "iSCSI03" vmk1 IP: 192.168.230.231 vmnic3
Port "iSCSI04" vmk1 IP: 192.168.231.231 vmnic4
Port "iSCSI05" vmk1 IP: 192.168.230.232 vmnic1
Port "iSCSI06" vmk1 IP: 192.168.231.232 vmnic2
Port "iSCSI07" vmk1 IP: 192.168.230.233 vmnic3
Port "iSCSI08" vmk1 IP: 192.168.231.233 vmnic4
Since I know I can ALWAYS learn a thing or two from other people, I had the vmware tech review my iSCSI initiator and vSwitch configurations. I also had DELL tech support review them as well and in both cases was given the green light; however, if you believe this type of port assignment has a flaw, I am willing to listen to your reasoning. If I have been given bad advise, I like to know that.
Thanks for your prompt response Dev Mgr! It helps to have another perspective on this issue.
manofbronze
25 Posts
0
July 12th, 2011 06:00
Dev Mgr,
You are exactly correct!
I had only (2) sub-nets (one for each controller) and was over subscribing my vmnics, causing too many iSCSI connections from a single host to the MD3220i. Somehow, I had it in my head this was the "correct" way to configure a MD3xxx series SAN. Lesson learned ....
Thank you for your assistance.
manofbronze
25 Posts
0
July 12th, 2011 09:00
Yes. I caught that ... but I knew what you meant :)
Dev Mgr
4 Operator
4 Operator
•
9.3K Posts
0
July 12th, 2011 09:00
I just realized 1 mistake in my IP suggestions; the server shouldn't be using "130" (last number), but 120 or so (below 128).