Unsolved
This post is more than 5 years old
4 Posts
0
5468
PowerPath AIX LPM and errpt message question
We had a couple of servers hang/crash during LPMs recently and IBM & EMC support suggested we change all of our LPAR hdisks & hdiskpowers reserve_policys to no_reserve. Which we did. Now when we LPM a test LPAR sometimes we see all hdisks for a hdiskpower with a CONNECTION FAILURE errpt message & a FUNCTION DEGRADED errpt message for the hdiskpower (see below). EMC says not to be concerned -> just informational messages. My boss says losing all paths to a hdiskpower device can't be good. What messages are 'normal' in errpt running LPM w/PowerPath?
A5E6DB96 0414083416 I S pmig Client Partition Migration Completed
1BBD20F4 0414083416 I H hdiskpower3 FUNCTION DEGRADED
C6E26F3B 0414083416 I H hdisk19 BACK-UP PATH STATUS CHANGE
1E67811B 0414083316 P H hdisk19 UNABLE TO COMMUNICATE WITH DEVICE
516A2BC4 0414083316 P H hdisk9 CONNECTION FAILURE
516A2BC4 0414083316 P H hdisk1 CONNECTION FAILURE
516A2BC4 0414083316 P H hdisk14 CONNECTION FAILURE
516A2BC4 0414083316 P H hdisk19 CONNECTION FAILURE
516A2BC4 0414083316 P H hdisk17 CONNECTION FAILURE
516A2BC4 0414083316 P H hdisk6 CONNECTION FAILURE
516A2BC4 0414083316 P H hdisk2 CONNECTION FAILURE
516A2BC4 0414083316 P H hdisk12 CONNECTION FAILURE
516A2BC4 0414083316 P H hdisk13 CONNECTION FAILURE
516A2BC4 0414083316 P H hdisk5 CONNECTION FAILURE
516A2BC4 0414083316 P H hdisk16 CONNECTION FAILURE
516A2BC4 0414083316 P H hdisk18 CONNECTION FAILURE
516A2BC4 0414083316 P H hdisk11 CONNECTION FAILURE
516A2BC4 0414083316 P H hdisk0 CONNECTION FAILURE
516A2BC4 0414083316 P H hdisk8 CONNECTION FAILURE
08917DC6 0414083316 I S pmig Client Partition Migration Started
umichklewis
1.2K Posts
1
April 19th, 2016 10:00
Incidentally, which version of PP are you on and which ODM files do you have installed?
PowerPath uses a unique identifier for each detected device-path pair. With LPM, you migrate the disk from one set of logged-in host initiators to another set of host initiators. From PowerPath's perspective, the host device isn't accessible on the current device-path and marks it unavailable. PowerPath sends errors to the errpt facility and get flagged as such.
You might want to test things by looking at the device state with powermt display dev=all prior to the LPM event. Note the devices and their path states. Then, execute the migration and check powermt display dev=all output again. I syspect you will need to run powermt restore to recheck paths to disk, then powermt config to configure new paths.
Let us know if that helps!
Karl
robh
21 Posts
1
April 20th, 2016 00:00
Temporary dead paths are an expected condition during an LPM operation.
As per EMC Engineering: "the original HBAs switch to the alternate HBAs, thereby causing a [temporary] failed path".
Setting reserve_policy to no_reserve is mandatory for all of the hdiskpower devices as well as their underlying hdisks.
There should be no need to run a powermt restore as the paths should recover automatically. However as Karl stated, good housekeeping would dictate that you run a powermt display dev=all before and after the LPM operation to verify path integrity.
The following EMC Knowledge Base articles have some further information on potential LPM issues:
000454413
000439567
000439265
halt2000
4 Posts
0
April 20th, 2016 05:00
Hi Karl - We're running AIX 7.1 - PP 6.0 & ODM 6.0.0.5. Also, we're
running our LPARs through a 4 engine VPLEX to a VMAX & also we use NPIV.
We have 8 paths to each VMAX lun. We run a powermt display before & after
the migration and both are all clean. Our concern is that sometimes all 8
paths show as offline for 1 second or so in the errpt.
From:
brn_lewisk
To:
halt2000
Date:
04/19/2016 01:48 PM
Subject:
Re: - PowerPath AIX LPM and errpt message question
ECN
PowerPath AIX LPM and errpt message question
reply from Karl in PowerPath - View the full discussion
Incidentally, which version of PP are you on and which ODM files do you
have installed?
PowerPath uses a unique identifier for each detected device-path pair.
With LPM, you migrate the disk from one set of logged-in host initiators
to another set of host initiators. From PowerPath's perspective, the host
device isn't accessible on the current device-path and marks it
unavailable. PowerPath sends errors to the errpt facility and get flagged
as such.
You might want to test things by looking at the device state with powermt
display dev=all prior to the LPM event. Note the devices and their path
states. Then, execute the migration and check powermt display dev=all
output again. I syspect you will need to run powermt restore to recheck
paths to disk, then powermt config to configure new paths.
Let us know if that helps!
Karl
Reply to this message by replying to this email, or go to the message on
ECN
Start a new discussion in PowerPath by email or at ECN
Following PowerPath AIX LPM and errpt message question in these streams:
Inbox
CONFIDENTIALITY NOTICE: This email message and any attachments are for the sole use of the intended recipient(s) and may contain proprietary, confidential, trade secret or privileged information. Any unauthorized review, use, disclosure or distribution is prohibited and may be a violation of law. If you are not the intended recipient or a person responsible for delivering this message to an intended recipient, please contact the sender by reply email and destroy all copies of the original
message.
halt2000
4 Posts
0
April 29th, 2016 08:00
Thanks Karl -> powermt display looks fine before & after the LPM. We're just concerned about the errpt messages. IBM says we should only see migration started/migration completed. I guess we're wondering what do other people see in errpt using PP?
Zikas
278 Posts
0
May 9th, 2016 01:00
Hi Halt2000,
when migrating an LPAR from one server to another, there is a period (<1 second) when the original WWNs switch to the backup pairs - thereby causing a very brief path down condition.
Once PP receives a failed IO from the underlying SCSI driver layer, it will report the path as dead and commence path testing.
Once the path is recovered, PP will mark the path status as alive/active.
So in an LPM scenario such as this, PP is working exactly as expected.
jeffkawa1
1 Message
0
June 14th, 2016 13:00
How do you explain the fact that during some LPM Migrations we see database disk errors in the database logs and unable to communicate with device messages in the aix errpt?