iDRAC logs the following event: MEM0702 Correctable memory error rate exceeded for DIMM (Bank/Slot)
1. Description
2. Solution
3. Further Information
A Correctable Memory Error is a single bit error which occurs when a bit if it erroneously changes, from 1 to 0 or from 0 to 1, during a write or read operation. When the specific bit in error is identified, the error is corrected by complementing the erroneous bit. Dell certified DIMMs perform this correction automatically.
In rare instances, a server may reboot after a correctable memory error is recorded in the SEL log. This has only see in BIOS version 2.3.x.
Example:
MEM0701 Warning Correctable memory error rate exceeded for DIMM_xx.
MEM0702 Critical Correctable memory error rate exceeded for DIMM_xx.
LC Log example:
2017-03-07 23:08:02 SYS1003 System CPU Resetting.
2017-03-07 23:08:02 SYS1001 System is turning off.
2017-03-07 23:08:02 MEM0702 Correctable memory error rate exceeded for DIMM_xx.
In order to resolve the reboot issue the BIOS should be updated to the most up to date version. If this is not possible for operational reasons, the BIOS should be brought up to the minimum versions as listed below:
R430 | 2.4.2 |
T430 | 2.4.2 |
R530 | 2.4.2 |
T630 | 2.4.2 |
R630 | 2.4.3 |
R730 | 2.4.3 |
R830 | 1.4.2 |
C4130 | 2.4.2 |
C6320 | 2.4.2 |
All modular blades | 2.4.2 |
Further Information
This issue has primarily been reported in the PowerEdge R630 and R730, however the potential exists in all of 13G with a BIOS version of 2.3.x. A change was introduced in BIOS version 2.3.x for additional logging to Security Policy Database (SPD) which introduced this particular issue:
"A NULL pointer dereferencing in BIOS enhanced SPD logging after memory correctable error critical threshold exceeded, would cause system to machine check or lock up."
The previously quoted BIOS versions for the affected platforms will fix the server reboot issue in conjunction with the correctable error rate exceeded message.
The issue has primarily been reported in R630 and R730. The potential exists in all PowerEdge 13G servers with BIOS version 2.3.x for the issue to occur.