PowerPath only logs entries to the syslog(HP-UX), messages (Solaris/Linux, and errpt (AIX) files. As you have said the error counter on the powermt display commands are historical representation of the number of path status changes. Usually when I'm troubleshooting my steps are as follows:
1. check powermt outputs for the error counters and see if there is a commonality, (all errors on one HBA, for one FA etc) 2. Check the uptime on the host so you can get an idea of the frequency of these errors. 3. Search the syslog/messages/errpt for the sting "dead" or "path state change" starting from the bottom (most recent) and working backwards to see what the cause of the path dying was.
From the above information, you can usually identify the failing component, if there is still doubt, move onto the switches and work your way back to the FA port on the array through the fabric.
For some more real-time monitoring, perhaps you could ask if the Sys Admins could monitor the host's syslog for strings such as "Killing Bus" as this is the message you see when all paths to a FA/SP die, or all paths from a HBA die. And alert real-time if these occur.
Conor
341 Posts
0
December 22nd, 2008 23:00
PowerPath only logs entries to the syslog(HP-UX), messages (Solaris/Linux, and errpt (AIX) files. As you have said the error counter on the powermt display commands are historical representation of the number of path status changes. Usually when I'm troubleshooting my steps are as follows:
1. check powermt outputs for the error counters and see if there is a commonality, (all errors on one HBA, for one FA etc)
2. Check the uptime on the host so you can get an idea of the frequency of these errors.
3. Search the syslog/messages/errpt for the sting "dead" or "path state change" starting from the bottom (most recent) and working backwards to see what the cause of the path dying was.
From the above information, you can usually identify the failing component, if there is still doubt, move onto the switches and work your way back to the FA port on the array through the fabric.
For some more real-time monitoring, perhaps you could ask if the Sys Admins could monitor the host's syslog for strings such as "Killing Bus" as this is the message you see when all paths to a FA/SP die, or all paths from a HBA die. And alert real-time if these occur.