powerpath problems with RHEL4ES latest kernels

Question

Hi, Does anyone is successfully running powerpath(EMCpower.LINUX-5.0.0-157) with RHEL4U5 kernels( 2.6.9-55.*.ELsmp)? I my case server works fine for a while and then(depends from workload I think) lots of processes goes into 'D' state and system load average grows (it is not even possible to do 'ps') very high. Such behavior I noticed only on hosts with powerpath installed and /dev/emcpower* mounted. Without powerpath and directly mounted SAN devices same hosts and same applications(oracle, mysql, lotus notes domino) on latest kernels works fine! Servers are running on IBM blades (HS20 type 8443) Any ideas? Latest with powerpath working kernel I have is 2.6.9-42.0.10.ELsmp

cdarn76 · Answer

I updated 11 hosts to powerpath 5.0 (EMCpower.LINUX-5.0.0-157) with RHEL4.5 kernel(2.6.9-55.02.ELsmp) from powerpath 4.5.1 (EMCpower.LINUX.4.5.1.022) with RHEL4u3 kernel(2.6.9-34.0.2.ELsmp).

The old kernel and powerpath was a very stable combination for me, the only reason I was updating was to correct the ext3 file system bug (For details, https://kbase.redhat.com/faq/FAQ_85_9610)

I experienced several severe problems with this combination (Some hosts within less than 24 hours, some within 72 hours, and worst case two weeks). For the most part the combination seemed to not like Oracle or IBM Webshere application servers. I'm not sure if that is load related or not.

I downgraded to the 2.6.9-42.0.10.ELsmp kernel with PowerPath 5.0 and this seems stable on a lab host that was crashing every 2 - 3 days. I had a total of 5 hosts that exhibited problems with this kernel and PP pairing.

Symptoms were as follows:
1. Unable to login to the host either via ssh or directly on console.

We have been able to get logged into a few of these hosts early and what we have noticed is iowait value seems to grow and stay stable.
So it would show as a average solid 25%, then awhile later be a solid 50%, then eventually go probably to 100% at that point the host is degraded an no longer providing services.

cdarn76 · Answer

Any hints on what Hot Fix 1 (5.0.0-202) is providing, does PowerPath 5.0.1 which is available for download have this fix in it? I had opened a powerlink case but it was closed out for two reasons.

1. We are using IBM xSeries x345 hardware which don't show up in the current elab blessed matrix, I later found out our storage group already had this blessed with a RPQ. We are different teams and they are supposed to perform perform all of the elab support checks.

2. Overall, Linux (Redhat) update level errata patching and support statements. Don't get me started on this one.

From the EMC technician I spoke with I got the impression that there are reports of this issue coming into EMC but he didn't go into any detail here. I have a case on this opened with HP which has been escalated over to Redhat. Below is an excerpt from the last update I heard from Red Hat Engineering:

Here is where the analysis is at. Note the issue with EMC's Powerpath.

The request makes it out of the elevator, but never gets out of endio. That leaves 4 distinct areas where something is going wrong:

scsi midlayer
scsi driver
hardware
power path

Unfortunately, PowerPath is in the critical path here. As PowerPath is a black box, we're unable to do any debugging here.

Do you know the extent of EMC's consulting about this issue?

Has EMC looked over this issue? Do you know whether anything has been done?

PowerPath

Was this post helpful?