Skip to main content
  • Place orders quickly and easily
  • View orders and track your shipping status
  • Enjoy members-only rewards and discounts
  • Create and access a list of your products

ECS: xDoctor: RAP163: Evento crítico de memoria del sistema

Summary: Se produjo un evento crítico de la memoria del sistema y es necesario revisarlo y reemplazarlo en el DIMM.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

xDoctor informa un evento crítico de memoria del sistema que necesita revisión.
------------------------------------
ERROR - Critical System Memory Event
------------------------------------
Node      = Nodes
Extra     = {'Nodes': {'169.254.1.1': ['Memory #0x02 - Uncorrectable ECC (UnCorrectable ECC |  DIMMB1) (06/10/2023 08:45:16)', 'Memory #0x03 - Uncorrectable ECC (UnCorrectable ECC |  DIMMB1) (06/10/2023 08:45:16)', 'Memory Mmry ECC Sensor - Correctable ECC (11/26/2015 12:38:51)']}}
RAP       = RAP163
Solution  = KB 215723
Timestamp = 2023-07-10_170539
PSNT      = CKMXXXXXXXXXXX @ 4.8-92.0

Cause

NOTA: Si falta alguno de los DIMM o se muestra un evento incorregible en los registros de eventos del sistema (SEL), se deben reemplazar.
  1. Compruebe los registros de SEL para confirmar que haya errores incorregibles en el nodo.

Comando: (Comando remoto)

# sudo ipmitool -H <iDrac IP> -U root -P passwd -I lanplus sel elist

Comando: (Nodo local)

# sudo ipmitool sel elist

Ejemplo:

admin@node1:~> sudo ipmitool -H 192.XXX.2XX.107 -U root -P passwd -I lanplus sel elist
   1 | 12/04/2021 | 07:29:19 | Event Logging Disabled SEL | Log area reset/cleared | Asserted
   2 | 12/29/2021 | 23:00:29 | Memory Mem ECC Warning | Transition to Critical from less severe | Asserted
   3 | 01/26/2022 | 11:44:08 | Memory Mem ECC Warning | Transition to Critical from less severe | Asserted
   4 | 08/03/2022 | 18:31:45 | Power Supply PS Redundancy | Redundancy Lost | Asserted
   5 | 08/03/2022 | 18:31:48 | Power Supply Status | Power Supply AC lost | Asserted
   6 | 08/03/2022 | 18:43:14 | Power Supply Status | Power Supply AC lost | Deasserted
   7 | 08/03/2022 | 18:43:22 | Power Supply PS Redundancy | Fully Redundant | Asserted
   8 | 08/03/2022 | 18:51:27 | Power Supply PS Redundancy | Redundancy Lost | Asserted
   9 | 08/03/2022 | 18:51:27 | Power Supply Status | Power Supply AC lost | Asserted
   a | 08/03/2022 | 19:02:03 | Power Supply Status | Power Supply AC lost | Deasserted
   b | 08/03/2022 | 19:02:14 | Power Supply PS Redundancy | Fully Redundant | Asserted
   c | 01/19/2023 | 05:38:27 | Memory Mem ECC Warning | Transition to Critical from less severe | Asserted
   d | 02/06/2023 | 02:10:25 | Memory Mem ECC Warning | Transition to Critical from less severe | Asserted
   e | 03/02/2023 | 17:12:15 | Memory Mem ECC Warning | Transition to Critical from less severe | Asserted
   f | 05/09/2023 | 15:56:41 | Memory #0x02 | Uncorrectable ECC (UnCorrectable ECC |  DIMMA1) | Asserted
  10 | 05/09/2023 | 17:16:16 | Memory Mem ECC Warning | Transition to Critical from less severe | Asserted
  11 | 05/09/2023 | 20:57:41 | Memory #0x02 | Uncorrectable ECC (UnCorrectable ECC |  DIMMA1) | Asserted
  12 | 05/09/2023 | 20:59:25 | Unknown #0x2e |  | Asserted
  13 | 05/09/2023 | 20:59:25 | Memory #0x02 | Uncorrectable ECC (UnCorrectable ECC |  DIMMB1) | Asserted
  14 | 05/11/2023 | 05:43:34 | Memory Mem ECC Warning | Transition to Critical from less severe | Asserted
  15 | 06/10/2023 | 08:43:26 | Memory #0x02 | Uncorrectable ECC (UnCorrectable ECC |  DIMMA1) | Asserted
  16 | 06/10/2023 | 08:45:16 | Unknown #0x2e |  | Asserted
  17 | 06/10/2023 | 08:45:16 | Memory #0x02 | Uncorrectable ECC (UnCorrectable ECC |  DIMMA1) | Asserted
  18 | 06/10/2023 | 08:45:16 | Memory #0x02 | Uncorrectable ECC (UnCorrectable ECC |  DIMMB1) | Asserted
  1. Confirme si faltan DIMM debido al evento.

Comando:

# sudo dmidecode -t memory | grep "Locator\|Size" | grep -v "Cache\|Volatile\|Cache\|Logical\|Bank"

Ejemplo:

admin@node1:~> sudo dmidecode -t memory | grep "Locator\|Size" | grep -v "Cache\|Volatile\|Cache\|Logical\|Bank"
        Size: No Module Installed <-- DIMM is missing
        Locator: A1
        Size: 16384 MB
        Locator: A2
        Size: No Module Installed
        Locator: A3
        Size: No Module Installed
        Locator: A4
        Size: No Module Installed
        Locator: A5
        Size: No Module Installed
        Locator: A6
        Size: No Module Installed
        Locator: A7
        Size: No Module Installed
        Locator: A8
        Size: 16384 MB
        Locator: B1
        Size: 16384 MB
        Locator: B2
        Size: No Module Installed
        Locator: B3
        Size: No Module Installed
        Locator: B4

Resolution

Recopile los resultados de los comandos anteriores y abra una solicitud de servicio haciendo referencia a KB 215723 a fin de revisar el DIMM del servidor para su reemplazo. 


Si el DIMM se reemplazó correctamente, xDoctor versión 4.8.92.0 o superior requiere que se borre el SEL en el nodo afectado. Detiene más alertas en esta entrada de registro.


Ejemplo: borrado del registro de eventos del sistema (SEL):

Consulte el iDRAC para obtener el registro de eventos del sistema y confirme que el error esté presente en el resultado. 

Recuerde, antes de borrar el SEL, compruebe si hay algún otro error que se deba abordar. Además, guarde el registro en /var/log/hardware Como se describe en KB 49569.

En este ejemplo, 192.168.219.101 corresponde a la dirección IP de iDRAC del nodo 1:

admin@provo~> ipmitool -I lanplus -H 192.168.219.101 -U root -P passwd sel list
   1 | 01/06/2022 | 04:34:58 | Event Logging Disabled #0x72 | Log area reset/cleared | Asserted
   2 | 02/03/2022 | 17:15:21 | Physical Security #0x73 | General Chassis intrusion () | Asserted
   3 | 02/03/2022 | 17:15:28 | Physical Security #0x73 | General Chassis intrusion () | Deasserted
   4 | 08/18/2023 | 01:44:01 | Memory #0x02 | Uncorrectable ECC (UnCorrectable ECC |  DIMMA1) | Asserted


  Borre el SEL: 

admin@provo:~> ipmitool -I lanplus -H 192.168.219.101 -U root -P passwd sel clear
Clearing SEL.  Please allow a few seconds to erase.


Valide que la lista se borró:

admin@provo~> ipmitool -I lanplus -H 192.168.219.101 -U root -P passwd sel list
   1 | 08/30/2023 | 12:56:55 | Event Logging Disabled #0x72 | Log area reset/cleared | Asserted
 

Affected Products

ECS Appliance Gen 3

Products

ECS Appliance
Article Properties
Article Number: 000215723
Article Type: Solution
Last Modified: 30 May 2024
Version:  7
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.