Symptoms
How to troubleshoot ECS performance issues.
Performance-related issues:
- One or more types of operations are slow. This is the most common issue since problem is noticed from end applications.
- One or more nodes are slow or have high resource utilization. This could be a question raised by the monitoring of the ECS, or found as part of normal usage.
Cause
Multiple factors such as network-related issues, high load, and usage patterns and so on.
Resolution
Questions to consider when experiencing performance degradation. (Expect Technical Support to request answers to below queries)
-
What behavior or symptoms are being experienced that indicates a performance problem?
-
What is the impact of the problem?
-
When was performance problem first noticed?
Anything change recently in the environment?
- Software or hardware?
- Load?
- Network?
- Firewall?
- Load Balancer?
- Can the problem be expressed in terms of latency or run time
- What is the environment? Software and hardware being used? Versions? Configuration? Customer application?
- What is the average file size (large and small files)?
- Are reads, writes, deletes, or updates affected? Or all methods affected?
- Ability to read or write using other Access methods? Is one specific application affected or are all applications affected?
What access method is being used:
- S3
- Swift
- Centera SDK (CAS) Is there a tranformation/ECSsync or other migration in progress?
- File System access:
- Windows using CIFS (geodrive)
- Linux using NFS
What is the application access pattern?
POST creates and or renames objects.
GET retrieves object data.
PUT updates object attributes
DELETE removes objects and metadata from the system.
HEAD corresponds to each GET method. HEAD looks exactly like a GET request except the method name is HEAD instead of GET.
The response for a HEAD request includes only headers; it does not include a response body.
-
Is your application connecting to all or some individual nodes? Is a load balancer configured?
-
Can the namespace be supplied, Bucket secret, and UID of the affected application so we can use perform similar tests on the ECS?
Information to collect on ECS if opening an SR with Technical Support to help narrow problem resolution
- Application logs from affected application specific to a failed request or delayed request are most beneficial here.
- Any observations from ECS user interface as in a node offline or other critical failures observed from dashboard menu
- Answers to questions from previous section
- Xdoctor run if system affected does not have dialhome capability.
Affected Products
Elastic Cloud Storage
Products
ECS Appliance, Elastic Cloud Storage