Start a Conversation

Unsolved

D

1 Rookie

 • 

13 Posts

29

June 17th, 2024 13:53

R730 Second CPU addition and RAM upgrade issue.

We tried to update a PE R730 by adding a second CPU and extra RAM for it but we encountered a problem. Specifically, when starting the server, the following two messages appeared:

  • A problem was detected in Memory Reference Code (MRC)
  • Multi-bit memory errors detected on a memory device at locations DIMM_B1.

And the BIOS recognized 112GB instead of the total 128GB (CPU1 already had 64GB and we added another 64 to CPU2).

The memory configuration at the time of upgrade, was as follows:

CPU1
A1 – A2 – A3 – A4 8GB
A5 - A6 16GB

CPU2
B1-B2-B3-B4 16GB

We tried the following without success:

  • We upgraded the BIOS and iDraq to the latest version
  • We changed the DIMM in position B1
  • We changed the 2nd CPU
  • We set the memory config of CPU1 to be the same as CPU2 (they were not the same initially)
  • We completely removed DIMM B1

and we always had the same error.

TLDR Things we didn't try :

  • Self healing via reboot
  • Log Clearing between troubleshooting steps

Do you think that the B1 socket is faulty or we missed something during troubleshooting?

Moderator

 • 

3.5K Posts

June 17th, 2024 19:45

Hello,

 

A DIMM with multibit errors should typically be replaced.

 

Are you saying you get a multibit error on DIMM_B1 with no DIMM installed in that slot?

 

 

Try these troubleshooting steps:

 

Check CPU and DIMM sockets for bent pins

 

Try One DIMM per processor at a time A1, B1 run test all DIMMs two at a time.

 

Boot to  F11 on Dell Splash screen, selecting  Boot Manager -> System Utilities -> Launch Dell Diagnostics.  Note any messages and continue testing.

 

Replace any DIMM that has multibit error

 

You may also try removing CPU2 and fully populate the memory for the CPU1 and check for errors

 

 

A few initial callout for memory population:

 

*Populate all the sockets with white release tabs first, followed by the black release tabs, and then the green release tabs

 

*In a dual-processor configuration, the memory configuration for each processor should be identical. For example, if you populate socket A1 for processor 1, then populate socket B1 for processor 2, and so on.

 

*When mixing memory modules with different capacities, populate the sockets with memory modules with highest capacity

first. For example, if you want to mix 16 GB and 8 GB memory modules, populate 16 GB memory modules in the sockets with

white release tabs and 8 GB memory modules in the sockets with black release tabs

 

Please see more in the Owner manual:

https://dell.to/45rWEfE

Memory specs. Page 29, 79-85

General memory module installation guidelines

Sample memory configurations

 

1 Rookie

 • 

13 Posts

June 18th, 2024 06:33

Hi DELL-Charles R,

Yes, even without any DIMM in B1 (but with DIMMS in B2 - B3 - B4) we get the multibit error for B1 Slot!  If we remove all DIMMs from Bx sockets we get no error. Maybe a log clearing (which we didn't perform) will fix this and let us install another good DIMM in B1 without errors?

In dual cpu setups, I understand that we have to install DIMMs in same positions for each CPU A1/B1, A2.B2. Should each Ax/Bx pair have also DIMMs of the same capacity ?

What about self healing. Is it able to fix multi bit errors? Where can we found more information about it?

(edited)

Moderator

 • 

3.5K Posts

June 18th, 2024 09:54

Hello,

In dual cpu setup also capacity Ax/Bx pair must be the same.

Yes i suggest to clear logs and try to reinstall again memory banks.

Here more info about self healing

What is DDR4 Self-healing on Dell PowerEdge Servers with Intel Xeon Scalable Processors | Dell US

 

Important, which BIOS version is installed?
Thanks

1 Rookie

 • 

13 Posts

June 18th, 2024 10:47

@DELL-Marco B​ We installed the latest BIOS and iDraq.

Moderator

 • 

3.5K Posts

June 18th, 2024 20:26

Hello,

 

Were you able to do the test on memory, two at a time, DIMM A1,B1 and test all DIMMs that way?

 

Try One DIMM per processor at a time A1, B1 run test all DIMMs two at a time.

 

Test:

Boot to  F11 on Dell Splash screen, selecting  Boot Manager -> System Utilities -> Launch Dell Diagnostics.  Note any messages and continue testing.

 

You are correct, if you have two CPU then CPU memory population should be identical. Say A1 and B1 - should be same including capacity.

1 Rookie

 • 

13 Posts

June 19th, 2024 05:45

@DELL-Charles R​ Since this is a production server, we will do the troubleshooting during next Saturday and let you know about the results.

One last question. If after troubleshooting we find out that the B1 or CPU socket  is faulty and we have to change the motherboard, something not acceptable cost wise, is there a way to setup memory on both CPUs to overcome this e.g by leaving the A1/B1 sockets empty or setting up the DIMMS in another way?

Moderator

 • 

3.2K Posts

June 19th, 2024 13:11

Thanks for letting us know, it is possible to use other DIMM configs we only did not recommend this.

 

1 Rookie

 • 

13 Posts

July 1st, 2024 10:16

After clearing the server logs, we did the test on memory, two at a time, DIMM A1,B1 and all DIMMs passed both the initial boot test and the system diagnostics.

BUT when we started adding DIMMS in A2,B2 we again got the Multibit Memory Error on DIMM B1 !!

In order to avoid it, we populated A2,B2 - A3,B3 - A4,B4 with DIMMS, so we didn't get the full 128GB we wanted but only 96GB which is sufficient for the time being and the server boots normally.

We tried to add the extra DIMMs from A1,B1 to the next A5,B5 but the server told us that this is out of order and did not accepted it. So we will have an issue in the future if we want to add more ram, though.

No Events found!

Top