Start a Conversation

Unsolved

This post is more than 5 years old

5468

April 19th, 2016 07:00

PowerPath AIX LPM and errpt message question

We had a couple of servers hang/crash during LPMs recently and IBM & EMC support suggested we change all of our LPAR hdisks & hdiskpowers reserve_policys to no_reserve.  Which we did.  Now when we LPM a test LPAR sometimes we see all hdisks for a hdiskpower with a CONNECTION FAILURE errpt message & a FUNCTION DEGRADED errpt message for the hdiskpower (see below).  EMC says not to be concerned -> just informational messages.  My boss says losing all paths to a hdiskpower device can't be good.  What messages are 'normal' in errpt running LPM w/PowerPath? 

A5E6DB96   0414083416 I S pmig           Client Partition Migration Completed
1BBD20F4   0414083416 I H hdiskpower3    FUNCTION DEGRADED
C6E26F3B   0414083416 I H hdisk19        BACK-UP PATH STATUS CHANGE
1E67811B   0414083316 P H hdisk19        UNABLE TO COMMUNICATE WITH DEVICE
516A2BC4   0414083316 P H hdisk9         CONNECTION FAILURE
516A2BC4   0414083316 P H hdisk1         CONNECTION FAILURE
516A2BC4   0414083316 P H hdisk14        CONNECTION FAILURE
516A2BC4   0414083316 P H hdisk19        CONNECTION FAILURE
516A2BC4   0414083316 P H hdisk17        CONNECTION FAILURE
516A2BC4   0414083316 P H hdisk6         CONNECTION FAILURE
516A2BC4   0414083316 P H hdisk2         CONNECTION FAILURE
516A2BC4   0414083316 P H hdisk12        CONNECTION FAILURE
516A2BC4   0414083316 P H hdisk13        CONNECTION FAILURE
516A2BC4   0414083316 P H hdisk5         CONNECTION FAILURE
516A2BC4   0414083316 P H hdisk16        CONNECTION FAILURE
516A2BC4   0414083316 P H hdisk18        CONNECTION FAILURE
516A2BC4   0414083316 P H hdisk11        CONNECTION FAILURE
516A2BC4   0414083316 P H hdisk0         CONNECTION FAILURE
516A2BC4   0414083316 P H hdisk8         CONNECTION FAILURE
08917DC6   0414083316 I S pmig           Client Partition Migration Started

1.2K Posts

April 19th, 2016 10:00

Incidentally, which version of PP are you on and which ODM files do you have installed?

PowerPath uses a unique identifier for each detected device-path pair.  With LPM, you migrate the disk from one set of logged-in host initiators to another set of host initiators.  From PowerPath's perspective, the host device isn't accessible on the current device-path and marks it unavailable.  PowerPath sends errors to the errpt facility and get flagged as such.

You might want to test things by looking at the device state with powermt display dev=all prior to the LPM event.  Note the devices and their path states.  Then, execute the migration and check powermt display dev=all output again.  I syspect you will need to run powermt restore to recheck paths to disk, then powermt config to configure new paths.

Let us know if that helps!

Karl

21 Posts

April 20th, 2016 00:00

Temporary dead paths are an expected condition during an LPM operation.

As per EMC Engineering: "the original HBAs switch to the alternate HBAs, thereby causing a [temporary] failed path".

Setting reserve_policy to no_reserve is mandatory for all of the hdiskpower devices as well as their underlying hdisks.

There should be no need to run a powermt restore as the paths should recover automatically.  However as Karl stated, good housekeeping would dictate that you run a powermt display dev=all before and after the LPM operation to verify path integrity.

The following EMC Knowledge Base articles have some further information on potential LPM issues:

000454413

000439567

000439265

4 Posts

April 20th, 2016 05:00

Hi Karl - We're running AIX 7.1 - PP 6.0 & ODM 6.0.0.5. Also, we're

running our LPARs through a 4 engine VPLEX to a VMAX & also we use NPIV.

We have 8 paths to each VMAX lun. We run a powermt display before & after

the migration and both are all clean. Our concern is that sometimes all 8

paths show as offline for 1 second or so in the errpt.

From:

brn_lewisk

To:

halt2000

Date:

04/19/2016 01:48 PM

Subject:

Re: - PowerPath AIX LPM and errpt message question

ECN

PowerPath AIX LPM and errpt message question

reply from Karl in PowerPath - View the full discussion

Incidentally, which version of PP are you on and which ODM files do you

have installed?

PowerPath uses a unique identifier for each detected device-path pair.

With LPM, you migrate the disk from one set of logged-in host initiators

to another set of host initiators. From PowerPath's perspective, the host

device isn't accessible on the current device-path and marks it

unavailable. PowerPath sends errors to the errpt facility and get flagged

as such.

You might want to test things by looking at the device state with powermt

display dev=all prior to the LPM event. Note the devices and their path

states. Then, execute the migration and check powermt display dev=all

output again. I syspect you will need to run powermt restore to recheck

paths to disk, then powermt config to configure new paths.

Let us know if that helps!

Karl

Reply to this message by replying to this email, or go to the message on

ECN

Start a new discussion in PowerPath by email or at ECN

Following PowerPath AIX LPM and errpt message question in these streams:

Inbox

CONFIDENTIALITY NOTICE: This email message and any attachments are for the sole use of the intended recipient(s) and may contain proprietary, confidential, trade secret or privileged information. Any unauthorized review, use, disclosure or distribution is prohibited and may be a violation of law. If you are not the intended recipient or a person responsible for delivering this message to an intended recipient, please contact the sender by reply email and destroy all copies of the original

message.

4 Posts

April 29th, 2016 08:00

Thanks Karl -> powermt display looks fine before & after the LPM.  We're just concerned about the errpt messages.  IBM says we should only see migration started/migration completed.  I guess we're wondering what do other people see in errpt using PP?

278 Posts

May 9th, 2016 01:00

Hi Halt2000,

when migrating an LPAR from one server to another, there is a period (<1 second) when the original WWNs switch to the backup pairs - thereby causing a very brief path down condition.

Once PP receives a failed IO from the underlying SCSI driver layer, it will report the path as dead and commence path testing.

Once the path is recovered, PP will mark the path status as alive/active.

So in an LPM scenario such as this, PP is working exactly as expected.

1 Message

June 14th, 2016 13:00

How do you explain the fact that during some LPM Migrations we see database disk errors in the database logs and unable to communicate with device messages in the aix errpt?

No Events found!

Top