Unsolved
1 Rookie
•
13 Posts
0
29
R730 Second CPU addition and RAM upgrade issue.
We tried to update a PE R730 by adding a second CPU and extra RAM for it but we encountered a problem. Specifically, when starting the server, the following two messages appeared:
- A problem was detected in Memory Reference Code (MRC)
- Multi-bit memory errors detected on a memory device at locations DIMM_B1.
And the BIOS recognized 112GB instead of the total 128GB (CPU1 already had 64GB and we added another 64 to CPU2).
The memory configuration at the time of upgrade, was as follows:
CPU1
A1 – A2 – A3 – A4 8GB
A5 - A6 16GB
CPU2
B1-B2-B3-B4 16GB
We tried the following without success:
- We upgraded the BIOS and iDraq to the latest version
- We changed the DIMM in position B1
- We changed the 2nd CPU
- We set the memory config of CPU1 to be the same as CPU2 (they were not the same initially)
- We completely removed DIMM B1
and we always had the same error.
TLDR Things we didn't try :
- Self healing via reboot
- Log Clearing between troubleshooting steps
Do you think that the B1 socket is faulty or we missed something during troubleshooting?
DELL-Charles R
Moderator
Moderator
•
3.5K Posts
0
June 17th, 2024 19:45
Hello,
A DIMM with multibit errors should typically be replaced.
Are you saying you get a multibit error on DIMM_B1 with no DIMM installed in that slot?
Try these troubleshooting steps:
Check CPU and DIMM sockets for bent pins
Try One DIMM per processor at a time A1, B1 run test all DIMMs two at a time.
Boot to F11 on Dell Splash screen, selecting Boot Manager -> System Utilities -> Launch Dell Diagnostics. Note any messages and continue testing.
Replace any DIMM that has multibit error
You may also try removing CPU2 and fully populate the memory for the CPU1 and check for errors
A few initial callout for memory population:
*Populate all the sockets with white release tabs first, followed by the black release tabs, and then the green release tabs
*In a dual-processor configuration, the memory configuration for each processor should be identical. For example, if you populate socket A1 for processor 1, then populate socket B1 for processor 2, and so on.
*When mixing memory modules with different capacities, populate the sockets with memory modules with highest capacity
first. For example, if you want to mix 16 GB and 8 GB memory modules, populate 16 GB memory modules in the sockets with
white release tabs and 8 GB memory modules in the sockets with black release tabs
Please see more in the Owner manual:
https://dell.to/45rWEfE
Memory specs. Page 29, 79-85
General memory module installation guidelines
Sample memory configurations
DiGr
1 Rookie
1 Rookie
•
13 Posts
0
June 18th, 2024 06:33
Hi DELL-Charles R,
Yes, even without any DIMM in B1 (but with DIMMS in B2 - B3 - B4) we get the multibit error for B1 Slot! If we remove all DIMMs from Bx sockets we get no error. Maybe a log clearing (which we didn't perform) will fix this and let us install another good DIMM in B1 without errors?
In dual cpu setups, I understand that we have to install DIMMs in same positions for each CPU A1/B1, A2.B2. Should each Ax/Bx pair have also DIMMs of the same capacity ?
What about self healing. Is it able to fix multi bit errors? Where can we found more information about it?
(edited)
DELL-Marco B
Moderator
Moderator
•
3.5K Posts
0
June 18th, 2024 09:54
Hello,
In dual cpu setup also capacity Ax/Bx pair must be the same.
Yes i suggest to clear logs and try to reinstall again memory banks.
Here more info about self healing
What is DDR4 Self-healing on Dell PowerEdge Servers with Intel Xeon Scalable Processors | Dell US
Important, which BIOS version is installed?
Thanks
DiGr
1 Rookie
1 Rookie
•
13 Posts
0
June 18th, 2024 10:47
@DELL-Marco B We installed the latest BIOS and iDraq.
DELL-Charles R
Moderator
Moderator
•
3.5K Posts
0
June 18th, 2024 20:26
Hello,
Were you able to do the test on memory, two at a time, DIMM A1,B1 and test all DIMMs that way?
Try One DIMM per processor at a time A1, B1 run test all DIMMs two at a time.
Test:
Boot to F11 on Dell Splash screen, selecting Boot Manager -> System Utilities -> Launch Dell Diagnostics. Note any messages and continue testing.
You are correct, if you have two CPU then CPU memory population should be identical. Say A1 and B1 - should be same including capacity.
DiGr
1 Rookie
1 Rookie
•
13 Posts
0
June 19th, 2024 05:45
@DELL-Charles R Since this is a production server, we will do the troubleshooting during next Saturday and let you know about the results.
One last question. If after troubleshooting we find out that the B1 or CPU socket is faulty and we have to change the motherboard, something not acceptable cost wise, is there a way to setup memory on both CPUs to overcome this e.g by leaving the A1/B1 sockets empty or setting up the DIMMS in another way?
Dell-Martin S
Moderator
Moderator
•
3.2K Posts
0
June 19th, 2024 13:11
Thanks for letting us know, it is possible to use other DIMM configs we only did not recommend this.
DiGr
1 Rookie
1 Rookie
•
13 Posts
0
July 1st, 2024 10:16
After clearing the server logs, we did the test on memory, two at a time, DIMM A1,B1 and all DIMMs passed both the initial boot test and the system diagnostics.
BUT when we started adding DIMMS in A2,B2 we again got the Multibit Memory Error on DIMM B1 !!
In order to avoid it, we populated A2,B2 - A3,B3 - A4,B4 with DIMMS, so we didn't get the full 128GB we wanted but only 96GB which is sufficient for the time being and the server boots normally.
We tried to add the extra DIMMs from A1,B1 to the next A5,B5 but the server told us that this is out of order and did not accepted it. So we will have an issue in the future if we want to add more ram, though.