Unsolved

This post is more than 5 years old

1 Rookie

 • 

2 Posts

74788

March 8th, 2017 04:00

CPU0704 CPU1/CPU2 Machine check error detected on Front LCD Panel

We have an R730 that has been powercycling on it's own with a recurring amber front LCD with a CPU0704 CPU1(and CPU2) Machine check error detected. Power cycle system" message.

This first occured after the initial Windows Server 2012r2 load, Dell BIOS/Drivers/Firmware updates, McAfee, and Windows updates. We initially saw this once a week and it has occured more and more frequently, currently several times a day.

In the Windows Systems Event log these are registering as kernel power failures and the Lifecycle Controller logs show multiple(dozens) instances of "An OEM diagnostic event occured." within a couple of seconds preceded by the same CPU1/CPU2 machine check error.

Upon contacting Dell support we were asked to provide a DSET or TSR several times (despite a search returning that DSET has been retired and and TSR has been replaced with SupportAssist). Due to our environment limitations, the only way we are able to connect externally is via proxy and iDRAC doesn't support proxy configurations. We have tried running a Dell Tech recommended Support Live Image, and only to have our system reboot before being able to complete diagnostics.

Requesting CPU and motherboard replacements via Dispatch after referring efforts to the initial Service Request were denied asking us to update BIOS and selected firmware (which we did initially) and again asking for a DSET.

We're currently in communications to let support know we've already done this and feel like we've exhausted our options at this point. A search in forums indicate that BIOS updates or CPU replacements have worked in most cases -- as the BIOS is current. Just wondering if anyone else has come across this issue and has any recommendations.

1 Rookie

 • 

117 Posts

December 18th, 2018 04:00

Well its interesting that all the systems we are having problems with all have 2x E5-2667 v2 (Model 62 Stepping 4).  I have spent hours testing, reinstalling, running a whole mess of CPU, memory and other stress testing tools and nothing.  It feels so much like a Windows driver/patch problem.  Maybe its specific to this Model and stepping?

What is frustrating is that Windows never seems to report a crash dump/bsod or anything.  Just reboots and get stuck on the F1 to continue BIOS screen because CPU0704 error in the BIOS logs.

I just checked again using the lifecycle controller for updates maybe a week ago and there were no updates.  Now I see that driver bundle v680 is out so I am pulling that into our WSUS servers to test with.

1 Rookie

 • 

117 Posts

December 20th, 2018 13:00

In our case this only happens during reboot/bootup sequences.  Once the system is running it seems to be stable.  We ran a stress test powershell script for 8+ hours without a single issue.  Then rebooted it and during the boot sequence it (sometimes) gets all the CPU0704 or CPU0000 error plus sometimes gets random extra CPU Resetting or hardresets until it finally boots.

2 Posts

November 21st, 2019 04:00

Hello All,

Dell R740xd similar issue  error "CPU 2 machine check error detected" 

We have update the below upgrades of bios and firmware but still the server got crashed with the error " CPU 2 machine check error detected" i have changed the server profile to max performance 

any solution to fix this ?please

Name

InstalledVersion

AvailableVersion

Criticality

Dell EMC Server PowerEdge BIOS R740/R740xd/R640/R940/7920R Version 2.3.10

2.2.11

2.3.10

Urgent

Non-expander Storage Backplane Firmware

4.27

4.32

Urgent

iDRAC with Lifecycle Controller 3.36.36.36

3.34.34.34

3.36.36.36

Urgent

Intel NIC Family Version 19.0.0 Firmware for X710, XXV710, and XL710 adapters

18.3.0

19.0.12

Urgent

PERC H740P/H840 RAID Controller Firmware 50.5.1-2818

50.3.0-1022

50.5.1-2818

Optional

Intel DL65 for model number(s) SSDSC2KB240G8T, SSDSC2KB480G8T, SSDSC2KB960G8T, SSDSC2KB240G8R, SSDSC2KB480G8R, SSDSC2KB960G8R, SSDSC2KB019T8R, SSDSC2KB038T8R, SSDSC2KG240G8T, SSDSC2KG480G8T, SSDSC2KG960G8T, SSDSC2KG240G8R, SSDSC2KG480G8R, SSDSC2KG960G8R, SSDSC2KG019T8R, SSDSC2KG038T8R, SSDSC2KG076T8R.

DL63

DL65

Urgent

1 Rookie

 • 

13 Posts

July 13th, 2020 09:00

Had the same issue today with a r720xd BIOS 2.7.0 and the server was rebooting, I changed the BIOS setting according and now the server boots fine.  I hope this fix and no further crashes.

 

Thanks.

Moderator

 • 

4.1K Posts

July 14th, 2020 10:00

Hello hharun,

 

The link you have are good steps to do.  I also recommend update iDRAC to latest:

 

iDRAC with Lifecycle Controller v. 2.65.65.65

https://dell.to/3fyuVPy

 

 

Along with the steps on that page you may also:

-Reseat the DIMMs

-Swap processor sockets

 

 

 

Steps for running Diagnostics:

Boot to  F11 on Dell Splash screen, selecting  Boot Manager -> System Utilities -> Launch Dell Diagnostics.  Note any messages and continue testing.

-you may get a message that looks like a failed test but is only a notification the Event Log contains failing records and need to be reviewed.

 

 

Please let me know how things go and if there is anything I can assist you with.

 

DELL-Charles R

Social Media and Communities Professional
Dell Technologies | Enterprise Support Services
#IWork4Dell

Did I answer your query? Please click on ‘Accept as Solution’. ‘Thumbs up’ the posts you like!

1 Message

July 14th, 2020 10:00

I have R720 that has been running well for many years. A couple of months ago, I updated host os from Redhat 6 to Redhat 7, upgrade the BIOS the the latest (2.9.0) and Lifecycle Firmware (2.41.40.40).

This machine went dead a couple of days ago with "CPU2 Machine check error".

Will be doing all the things listed on

<ADMIN NOTE: Broken link has been removed from this post by Dell>

This original thread is started in 2017, anyone manage to get an update from Dell regarding this issue?

 

1 Rookie

 • 

33 Posts

June 3rd, 2022 03:00

I've just upgraded my Dell r630's CPU to a E2-2696 v4 and within the first day I got the same error and my machine forced a rebooted.

CPU0704: CPU 1 machine check error detected.

I have ran the Diagnostic tool and found no issues.

I don't know if it is related but I also got this error/warning:

PWR2262: The Intel Management Engine has reported an internal system error.

 Any advice on how to further debug this issue or should I just ignore  it?

I'm running BIOS Version 2.13.0 and Life Cycle Controler Firmware 2.83.83.83. My Dell r630 is running with a single CPU.

Moderator

 • 

2.7K Posts

June 3rd, 2022 04:00

Hello, First please do a power drain of your system as below.

1. Power the server down.  
2. Disconnect the server from all power cables, Network cables. 
3. Hold down the power button continuously for at least 10 seconds.  
4. Insert power cables and network cables back to the system.  
5. Wait about 2 minutes before powering on the server to give the iDRAC time to initialize. 
6. Power the system on

 

PWR2262 "The Intel Management Engine has reported an internal system error" usually This is a communication issue between the iDRAC and the Intel Management Engine. BIOS and iDRAC updates can often fix it. Other than that, you can also try: Change BIOS system profile to custom and disable C State or chose a profile which has C State disabled. There is a setting to configure the QPI Link power management. The default is "Enabled". Change this setting to "Disabled". (System BIOS Setting>Processor Settings)

 

For CPU machine check error please check this detailed article: https://dell.to/3Q2pnPR

 

Hope that helps!

DELL-Erman O

Social Media and Communities Professional

Dell Technologies | Enterprise Support Services

#IWork4Dell

Did I answer your query? Please click on ‘Mark as Accepted Answer’. ‘Thumbs up’ the posts you like!

1 Rookie

 • 

33 Posts

June 3rd, 2022 06:00

Thanks. I have followed your steps now. Is there a way to further test, challenge or "provoke" the CPU for testing?

Moderator

 • 

2.7K Posts

June 3rd, 2022 07:00

Hi, there is a System Live Image it's a CentOS and includes many useful tools. Such as a stress tool or CPU test. I think this SLI is helpful for further tests if you want. 

 

Dell Support Live Image Version 3.0 https://dell.to/3915SXx

 

Support Live Image Version 3.0 User's Guide https://dell.to/3NTjvqe

DELL-Erman O

Social Media and Communities Professional

Dell Technologies | Enterprise Support Services

#IWork4Dell

Did I answer your query? Please click on ‘Mark as Accepted Answer’. ‘Thumbs up’ the posts you like!

1 Rookie

 • 

33 Posts

June 8th, 2022 00:00

Thanks for the links. I'm still experiencing the error and random reboots. Here is what I have done so far:

  1. Power drain my system
  2. Running the latest BIOS and LifeCycleController firmware
  3. Reset my BIOS to factory settings
  4. Run the LifeCycleContoller diagnostic test (no errors)
  5. Booting the Dell Support Live Image ISO:
    1. Ran: sudo stress --cpu 44 --timeout 30s --verbose
    2. Running Intel Processor Diagnostic Tool

Every time I run the Intel Processor Diagnostic tool the system reboots at 8% (see the screen dump just right before reboot)

intel.png

If I change the System Profile settings to Performance, which has C-state disabled then the Intel Processor Diagnostic tool passes with no errors.

So what to take from this? Shouldn't I be able to run this processor with other System Profiles than Performance?

Could it be a CPU issue? I'm still within the warranty period of the CPU. I never experiened this with my previous CPU (E5.2670 v3).

1 Rookie

 • 

33 Posts

June 8th, 2022 00:00

Just to further add: Even with C-state disabled it still randomly reboots. Now I have also disabled C1E (I guess it is also a part of the C-state settings?) So, let's see if the error reappears.

1 Rookie

 • 

33 Posts

June 10th, 2022 00:00

Just to follow up. Disabling C1E solved the issue. But my question is what am I missing out by having disabling C1E and C state?

And shouldn't I be able to have these settings enabled? I just purchased the new CPU and if it is broken I would like to return it within the warranty period.

Moderator

 • 

3.7K Posts

June 10th, 2022 02:00

Hi @runevn,

 

Here's the description on C states: https://dell.to/3Q8Pipn

 

C state is mainly for power saving and for servers it would be recommended to leave the settings for non power saving. The option is available with due to some server administrators would want the feature as a requirement for their data centers. 

DELL-Joey C

Social Media and Communities Professional

Dell Technologies | Enterprise Support Services

#IWork4Dell

Did I answer your query? Please click on ‘Mark as Accepted Answer’ if I did. 

1 Rookie

 • 

33 Posts

June 22nd, 2022 03:00

Thanks for the links. It make sense that in a datacenter context, C-states shouldn't be enabled.

I would like to ask if you have any idea that I get the error when enabling C1E? As I wrote before I have the ability to return the CPU if it is an defect CPU.

So, what is recommended that I do now? Should I return it or is it a software issue?

Thanks in advance.

No Events found!

Top