跳转至主要内容
  • 快速、轻松地下订单
  • 查看订单并跟踪您的发货状态
  • 创建并访问您的产品列表

Dell PowerEdge 13G - Possible Reboot After "Correctable Memory Errors"

摘要: How to correct reboot after "Correctable memory error rate exceeded for DIMM_xx". on certain PowerEdge 13G servers

本文适用于 本文不适用于 本文并非针对某种特定的产品。 本文并非包含所有产品版本。

症状

iDRAC logs the following event: MEM0702 Correctable memory error rate exceeded for DIMM (Bank/Slot)

 

原因

Table of Contents

1. Description
2. Solution
3. Further Information
 

 


Description

A Correctable Memory Error is a single bit error which occurs when a bit if it erroneously changes, from 1 to 0 or from 0 to 1, during a write or read operation. When the specific bit in error is identified, the error is corrected by complementing the erroneous bit. Dell certified DIMMs perform this correction automatically.
In rare instances, a server may reboot after a correctable memory error is recorded in the SEL log. This has only see in BIOS version 2.3.x.

Example:

MEM0701 Warning Correctable memory error rate exceeded for DIMM_xx.
MEM0702 Critical Correctable memory error rate exceeded for DIMM_xx.


LC Log example:

2017-03-07 23:08:02 SYS1003 System CPU Resetting.
2017-03-07 23:08:02 SYS1001 System is turning off.
2017-03-07 23:08:02 MEM0702 Correctable memory error rate exceeded for DIMM_xx.

 

 

解决方案


Solution

In order to resolve the reboot issue the BIOS should be updated to the most up to date version. If this is not possible for operational reasons, the BIOS should be brought up to the minimum versions as listed below:

 
R430 2.4.2
T430 2.4.2
R530 2.4.2
T630 2.4.2
R630 2.4.3
R730 2.4.3
R830 1.4.2
C4130 2.4.2
C6320 2.4.2
All modular blades 2.4.2
Table 1: Relevant BIOS versions and models
 
SLN305799_en_US__1icon The T130, R230, T330, R330, and R930 are not affected by this issue. 
SLN305799_en_US__1icon If correctable Memory errors occur after the update of BIOS a standard troubleshooting process should be implemented.

 


Further Information

This issue has primarily been reported in the PowerEdge R630 and R730, however the potential exists in all of 13G with a BIOS version of 2.3.x. A change was introduced in BIOS version 2.3.x for additional logging to Security Policy Database (SPD) which introduced this particular issue:

"A NULL pointer dereferencing in BIOS enhanced SPD logging after memory correctable error critical threshold exceeded, would cause system to machine check or lock up."

The previously quoted BIOS versions for the affected platforms will fix the server reboot issue in conjunction with the correctable error rate exceeded message.

The issue has primarily been reported in R630 and R730.  The potential exists in all PowerEdge 13G servers with BIOS version 2.3.x for the issue to occur.

受影响的产品

PowerEdge c6320, Poweredge FC430, Poweredge FC630, Poweredge FC830, PowerEdge M630, PowerEdge M630 (for PE VRTX), PowerEdge M830, PowerEdge M830 (for PE VRTX), PowerEdge R430, PowerEdge R530, PowerEdge R530xd, PowerEdge R730, PowerEdge R730xd , PowerEdge R830, PowerEdge R930, PowerEdge T630 ...
文章属性
文章编号: 000141221
文章类型: Solution
上次修改时间: 18 7月 2023
版本:  5
从其他戴尔用户那里查找问题的答案
支持服务
检查您的设备是否在支持服务涵盖的范围内。