Skip to main content
  • Place orders quickly and easily
  • View orders and track your shipping status
  • Enjoy members-only rewards and discounts
  • Create and access a list of your products
Some article numbers may have changed. If this isn't what you're looking for, try searching all articles. Search articles

VNX/VNXe: LUNs showing high response times without overloaded drives (User Correctable)

Summary: Troubleshooting LUNs with high response times when there are no overloaded drives.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

  • Troubleshooting LUNs with high response times when there are no overloaded drives.
  • LUNs showing high latency without overloaded drives.

Cause

Response times are calculated using the I/O queue length that is divided by the throughput, following Little's Law. When the throughput is low (typically under 100 IOPS), some surprisingly high response time values can be recorded. These can be ignored, unless the host's performance monitoring is also recording high response times for that same LUN. To check if there is an issue, other LUNs which share the same RAID Group/Pool, can also be checked to see if they had high response times over the same period.

Another way to confirm whether there is a reporting issue for high response time figures is to compare graphs for host response times (such as from Windows Task Manager or Perfmon), with the same LUN on the array seen from Unisphere Analyzer. If the host and array response times peak simultaneously, then that is likely to be a genuine issue which needs further investigation; otherwise it is likely to be an anomaly which can be ignored.

There can be several causes and solutions for high response times. The issues that are easy to detect from NAR or NAZ files include:

  • Forced flushing of write cache (article 10256).

  • High SP Utilization (article 11592).

  • Overloaded disk drives in a RAID Group or Storage Pool (article 52084).

  • Overloaded FAST cache drives (article 73184).

  • Qfull events (article 53727).

## IMPORTANT NOTE ##
If the QoS (Quality of Service) functionality is enabled on the array, and depending on the limits set, high queue lengths and response times may be experienced. This is by design, as the feature aims to limit the overall amount of I/O to be processed by Storage.
 
QoS (Quality of Service) Host I/O Limits:
 
QoS is designed to limit I/O on Block LUNs, Attached Snapshots, or VMFS Datastores (not File resources).
The purpose is to apply a rate limiting by I/O, bandwidth, or both so that specific resources do not overconsume.
The Use Case is to avoid the "noisy neighbor" within shared host environments. Limits are applied as a ceiling which cannot be exceeded.
Limits can be set by Max IOPS or Bandwidth (KBPS or MBPS), or both. If both I/O limit types are set, traffic is limited per the threshold reached:
- Minimum setting for Max BW KBPS on a LUN is 50 kbps.
- Minimum setting for Max IOPs on a LUN is 100 kbps.

Also, be aware that the QoS functionality is not meant to be a permanent long-term solution and should not replace a thorough and complete workload or environment sizing before deployment.

Resolution

The events above can be seen by doing a performance health check as detailed in article 15335. There are however other issues which are harder to spot, which include:

  • Read or write starvation. This occurs when the front-end ports are congested with large I/O operations and so responses for other LUNs get held up in the front-end port queues. One way to avoid this, is to move hosts which have large read or write block sizes, to other front-end ports (article 91962).

  • Short bursts of I/O. On average the throughput can look low, but some hosts and applications can send large amounts of I/O in a short burst and then go back to low levels of I/O again. As the Analyzer statistics are averages over the polling interval, these bursts can be hard to spot. LUNs with a high Average Busy Queue Length (ABQL) may have this issue. Spreading the host load over more LUNs can help to absorb large I/O bursts. Some applications can divide the workload between multiple LUNs, or they can be striped in the host's file-system. However, these LUNs used in parallel should not share the same drives, because that would cause linked contention (article 49371). The HBA queue length can also be adjusted to prevent the host from sending all the I/O at once, as detailed in article 53727.

  • SCSI reservation conflicts. These can often be caused by LUNs in multiple Storage Groups that have inconsistent HLU (article 41822).

  • iSCSI connectivity issues. See article 71615 for iSCSI troubleshooting.

Affected Products

VNX/VNXe

Products

CLARiiON CX3 Ultrascale Series, CLARiiON CX4 Series, VNX1 Series, VNX2 Series, VNX5100, VNX5200, VNX5300, VNX5400, VNX5500, VNX5600, VNX5700, VNX5800, VNX7500, VNX7600, VNX8000, VNX/VNXe
Article Properties
Article Number: 000078040
Article Type: Solution
Last Modified: 30 Jul 2021
Version:  4
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.