Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

929

March 6th, 2009 14:00

Service Times and queues

Hoping to get some consensus on some service times and queue observations we have been seeing for awhile now. We have an HP-UX 11.23 system (rx8640) attached to a DMX-3 over 4 FC paths. 4GB HBA's, 4GB Fabric Ports, and 4GB FA's. We're running PowerPath 5.1.0 on the HP-UX OS.

Our tool to measure service times and queues has been SAR with "sar -d" and we're doing collections of this every 10sec. Queues of greater than 4 are being seen as well as service times on some devices of upwards of 150ms and sometimes as high as 300ms.

Optimizer is running on the DMX-3, but not advising of any hotspots. Logs are being looked at now from ECC by EMC support. My question is if we should be interpreting these sar observations as a real issues since we're seeing this sporadically and not constantly? The only constant is that the observed sar service times and disk queues are seen everyday for some period of time which seems to be related to known peak utilization times of our application and system.

Any thoughts on this would be greatly appreciated...

KPS

1.3K Posts

March 7th, 2009 12:00

service time from "sar -d" is blended across metas and can be used as a trend rather than the exact service time of the disk. But definitly it gives an indication of the expected service time.

Can you run the "powermt watch" and the time you see the value this can indicate if the IOs are getting queued (Q-IOs)are at the host/HBA level. Also what value do you have for scsi_max_qdepth at the kernel level?

8 Posts

March 9th, 2009 06:00

Our scsi_max_qdepth kernel param is set to 32. We can try this powermt watch, but I think it's going to be tough to correlate what we see here to what's in sar.

1.3K Posts

March 15th, 2009 09:00

If you see "Q-IO" then definitly there is a bottle neck..

8 Posts

March 16th, 2009 18:00

powermt watch was somewhat helpful, but we're looking at sar data here on the host side and feel we're getting more drilled down info on %busy, queues, and service times from using sar.

We do see service times above 30ms and at times queues in the area of 5 or 6.

May 17th, 2010 01:00

Hi,

Just a note about "sar -d" outputs. Please make sure you are seeing the "real world" by deleting any sar records with no IO but a service time, and also records with less than 1 IO\sec showing. These records are usually false values and skew the sar results. This is the "tail" end of the normal "top-and-tail" that people do to sar to remove misleading records. The "top" refers to deleting the top few percent (highest response times) of the remaining records, which are often also false values. Use of the "top" depends on how much data there is.

Please note Queued IOs in Powerpath is a good thing for parallel threads on an HBA, and a not-so-good thing for single-threaded operations. Queued IOs are normally only an issue if the application IO is "too slow" responding. EMC provides Performance HealthCheck services to clarify such situations as required.

No Events found!

Top