Unsolved

This post is more than 5 years old

4 Posts

41309

December 29th, 2011 12:00

Ram Report - "wrong values reported by BIOS"

My server crashes about every week and throws some ram error messages.  A reboot gets it back up.  I am just now working on the issue.  When I run a system report in CPUZ, the SPD section is grayed out.  So I ran a report using SIW and get the below error message:

Memory Summary - Reported by Bios

Warning - Wrong values reported by BIOS
Maximum Capacity   - 196608 MBytes
Maximum Memory Size - Unknown
Error Correction - Multi-bit ECC
DRAM Frequency 665.1 MHZ
Memory Timings - 9-9-9-24

Then it lists what is in each slot.  DIMMS A1 through A6 and then DIMMS B1 through B6.

Memory Type (8GB sticks) is:

  • HYNIX HMT31GR7BFR4C-H9
  • DDR3 SDRAM PC3-10600
  • CL=9
  • ECC Registered
  • DDR3-1333MHz
  • 1.5V
  •  x 72
  • 8 Bank
  • 240 pins
  • Refresh 8K
  • Number of DRAMS:  x 36

and

  • Hynix HMT31GR7AFR4C-H9
  • DDR3 SDRAM PC3-10600
  • CL=9
  • ECC Registered
  • DDR3-1333MHz
  • 1.5V
  •  x 72
  • 240 pins
  • Refresh 8K / 64ms
  • Number of DRAMS:  x 36

This is a Dell R710, 2.67ghz.  Do the two slightly different ID numbers on the ram matter?  I couldn't find an explanation of the differences.  Thanks.

10 Elder

 • 

6.2K Posts

December 29th, 2011 18:00

Hello skbryan

I wouldn't worry about the letter difference on the model number.  That just indicates they are different generations of the memory.  One is the A and the other is the B generation.  If there is a different component used in manufacturing or any change is made to the production of the component then a new model number is created.

I would suggest that you test the memory using either our online diagnostics; which can be run within windows, or you can boot to the USC via F10 during POST and use the built in diagnostic utility.

Online diags: ftp://ftp.dell.com/diags/dell-onlinediags-win32-2.18.0.11.exe

You should also have some minidumps from the crashes if you have min dumps enabled. If you have them and use the debugger to view the log then there will be a file name that is listed as the likely cause of the crash at the end of the log.

Dump file KB article: http://support.microsoft.com/kb/315263

I would also recommend that you check the hardware log to see if there are errors on specific DIMM's or other components.  The easiest way to view the hardware log is with Open Manage Server Administrator.

OMSA 6.5: ftp://ftp.dell.com/sysman/OM-SrvAdmin-Dell-Web-WIN-6.5.0-2247_A01.10.exe

Our 11th generation servers have the memory controller built into the CPU, so if this is a memory issue then you would need to swap the memory around to determine if it is a bad slot on the board, bad DIMM, or bad memory controller on the CPU.  You would do this by moving the memory and running diags again to see if the errors stay with the slot, follow the DIMM, or randomly occur with a memory lane on one of the CPU's.

Thanks

4 Posts

December 30th, 2011 07:00

Thank you for your very thorough answer.  I did run the dell diags utility, and no problems were detected.  I opened up the Open Manage Server Administrator that was already installed and saw the below.  I had to come in and reboot the server manually on the 24th.

 Wed Dec 14 12:16:44 2011  PS 1 Status: Power Supply sensor for PS 1, failure (PMBus communication error) was deasserted

 Wed Dec 14 12:16:44 2011  PS 1 Status: Power Supply sensor for PS 1, failure (PMBus communication error) was asserted

 Fri Dec 23 23:51:03 2011  Mem ECC Warning: Memory sensor, transition to critical from less severe ( DIMM_A2 ) was asserted

 Fri Dec 23 23:51:03 2011  Mem ECC Warning: Memory sensor, transition to non-critical from OK ( DIMM_A2 ) was asserted

 Sat Dec 24 08:28:21 2011  OEM event data record

 Sat Dec 24 08:28:21 2011  System Software event: OS Event sensor, C: boot completed was asserted

Regarding the ram, I need to get a few spares.  I was going to get the same type that is already in it, but is there a brand or generation of ram that you would recommend for this server?

And why do you reckon the memory summary from SIW gives this warning - Wrong values reported by BIOS?

Thanks again for your help.    

2 Intern

 • 

847 Posts

December 30th, 2011 11:00

On the bright side?  I think you are having an easily fixed ram issue of some sort.     Odd one at that though I admit.

10 Elder

 • 

6.2K Posts

December 30th, 2011 11:00

The reason it gave a wrong values reported is because the BIOS did not report the maximum supported memory to the application, or it provided something that was not understood for that value.

It looks like the issue is with DIMM A2.  If you have two CPU's in the system then I would recommend that you swap DIMM A2 with DIMM B2. You will then need to wait for a new error to occur on the DIMM's.

  • If the error follows the DIMM to slot B2 then it is a faulty DIMM.
  • If the error stays with slot A2 then it is most likely a faulty slot on the system board.  I would recommend swapping the CPU's to make sure it is not CPU 1 though.
  • If the error moves to one of the other DIMM's in the A lane then it is most likely the CPU.  I would recommend swapping the CPU's and monitoring to make sure though.

Thanks

4 Operator

 • 

9.3K Posts

December 30th, 2011 14:00

Another possibility is that it's the processor. The current Xeon processors (55 and 56 series, like those used in your R710) have the memory controller in the processor. I ran into a problem getting memory to be recognized on a retail board with a Xeon E5502 I had laying around once; I could never get the memory to be properly recognized. I took a close look at the processor socket and I am pretty sure it was a bent pin in the socket that caused the memory detection issues. I just returned the board (didn't exchange it), so I never got a 100% confirmation, but the proc and memory worked fine in my Dell Precision Workstation T7500.

Top