Skip to main content
  • Place orders quickly and easily
  • View orders and track your shipping status
  • Enjoy members-only rewards and discounts
  • Create and access a list of your products

How to Troubleshoot and Resolve Memory Errors Within a Unified Computing System Environment

Summary: This article details how to troubleshoot and resolve memory errors within a Cisco Unified Computing System (UCS) environment.

This article applies to   This article does not apply to 

Instructions

Error Identification:

  • Review the 'faults' tab within UCS to determine whether there are errors and impact. 
  • Capture UCSM and Chassis logs from the affected server BEFORE any troubleshooting is done. This is necessary to capture historical data to identify whether these errors return after troubleshooting. 


Error Confirmation:
Once errors are identified, clear them all, and monitor counters to see if they persist. 

  1. Log in to the UCS command line.
  2. Reset memory errors using the following commands:

CLI# scope server X/Y
CLI# reset-all-memory-errors
CLI# commit-buffer

  1. Clear System Event Logs using the following commands:

CLI# scope server X/Y
CLI# clear sel
CLI# commit-buffer

  1. Reset CIMC using the following commands:

CLI# scope server X/Y
CLI# scope cimc
CLI# reset
CLI# commit-buffer

  1. Monitor the environment for 48 hours. 
If memory errors persist, capture a fresh set of UCSM and Chassis logs, and go to the next section. 


Physical Troubleshooting:
Before a DIMM module can be replaced, determine if the errors are related to the socket, the DIMM, or the CPU.

This is done by swapping the hardware components and monitoring the environment. Instructions are provided below:
  1. Put ESXi host in maintenance mode. 
  2. The faulted DIMMs should be swapped with DIMMs that were not previously showing any issues.
  3. The server should be rebooted and remain in maintenance mode.
  4. The server may be monitored for 48 hours to see if the issue presents itself again.

If you are unable to reseat the components, contact Dell Support or engage additional resources for assistance.  

If the errors persist after reseats, follow the actions below:

  • If DIMM errors follow the DIMM to a new slot, and replace the DIMM. 
  • If DIMM errors stay with the same DIMM slot, replace the motherboard.
  • If DIMM errors persist after DIMM and motherboard replacement, initiate a WebEx for live troubleshooting with Dell Support.  

Additional Information

Watch this video:

Affected Products

Converged Infrastructure
Article Properties
Article Number: 000194121
Article Type: How To
Last Modified: 10 Jan 2023
Version:  3
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.