2 Posts

October 15th, 2007 11:00

I updated 11 hosts to powerpath 5.0 (EMCpower.LINUX-5.0.0-157) with RHEL4.5 kernel(2.6.9-55.02.ELsmp) from powerpath 4.5.1 (EMCpower.LINUX.4.5.1.022) with RHEL4u3 kernel(2.6.9-34.0.2.ELsmp).

The old kernel and powerpath was a very stable combination for me, the only reason I was updating was to correct the ext3 file system bug (For details, https://kbase.redhat.com/faq/FAQ_85_9610)

I experienced several severe problems with this combination (Some hosts within less than 24 hours, some within 72 hours, and worst case two weeks). For the most part the combination seemed to not like Oracle or IBM Webshere application servers. I'm not sure if that is load related or not.

I downgraded to the 2.6.9-42.0.10.ELsmp kernel with PowerPath 5.0 and this seems stable on a lab host that was crashing every 2 - 3 days. I had a total of 5 hosts that exhibited problems with this kernel and PP pairing.

Symptoms were as follows:
1. Unable to login to the host either via ssh or directly on console.

We have been able to get logged into a few of these hosts early and what we have noticed is iowait value seems to grow and stay stable.
So it would show as a average solid 25%, then awhile later be a solid 50%, then eventually go probably to 100% at that point the host is degraded an no longer providing services.

2 Posts

October 18th, 2007 06:00

Any hints on what Hot Fix 1 (5.0.0-202) is providing, does PowerPath 5.0.1 which is available for download have this fix in it? I had opened a powerlink case but it was closed out for two reasons.

1. We are using IBM xSeries x345 hardware which don't show up in the current elab blessed matrix, I later found out our storage group already had this blessed with a RPQ. We are different teams and they are supposed to perform perform all of the elab support checks.

2. Overall, Linux (Redhat) update level errata patching and support statements. Don't get me started on this one.

From the EMC technician I spoke with I got the impression that there are reports of this issue coming into EMC but he didn't go into any detail here. I have a case on this opened with HP which has been escalated over to Redhat. Below is an excerpt from the last update I heard from Red Hat Engineering:

Here is where the analysis is at. Note the issue with EMC's Powerpath.

The request makes it out of the elevator, but never gets out of endio. That leaves 4 distinct areas where something is going wrong:

scsi midlayer
scsi driver
hardware
power path

Unfortunately, PowerPath is in the critical path here. As PowerPath is a black box, we're unable to do any debugging here.

Do you know the extent of EMC's consulting about this issue?

Has EMC looked over this issue? Do you know whether anything has been done?
No Events found!

Top