Start a Conversation

Unsolved

This post is more than 5 years old

O

534793

February 2nd, 2015 14:00

R710: A bus fatal error was detected on a component at bus 3 device 0 function 0

Hello

We have an R710 that crashed over the weekend.  The system event log shows me the following errors:

 

A bus fatal error was detected on a component at bus 0 device 4 function 0.

A bus fatal error was detected on a component at bus 3 device 0 function 0.

I ran lspci and get the following


00:00:04.0 PCI bridge Bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root Port 4 [PCIe RP[00:00:04.0]]
Class 0604: 8086:340b


00:03:00.0 RAID bus controller Mass storage controller: LSI Logic / Symbios Logic Dell PERC 6/i Integrated [vmhba1]
Class 0104: 1000:0060

I rebooted the R710 and it came up ok.  What's my next step to resolve this?  Do I have to replace the motherboard?  Can I ignore it since the server came back up?  Guidance and suggestions are appreciated.

 

Thanks,

 

Scott

Moderator

 • 

8.6K Posts

February 3rd, 2015 05:00

OneMoreUsername,

What you should start with, since the server is online, is to update the BIOS, Drac and Perc 6/i. After that what you will need to do is power down the server, then remove the Perc 6/i, as seen below.

After that reseat Riser 1, that's the riser the controller was attached to. Beyond that we can do one of two things, if the PCIe error is repeatedly coming on, then leave the controller out when you power on to see if the error is still present. If the PCIe error is not repetitive, then go ahead and reconnect the Perc 6/i and power up the server.

Now if you still get the error, and happen to have other R710, you can try swapping Riser 1 with a known good. If that fails to resolve the issue, then it is likely the Perc 6/i or the motherboard itself. So if you didn't prior, try running without the controller and see if the error stops. This should help you determine which piece of hardware is causing the issue. 

Let me know what you are seeing.

February 3rd, 2015 14:00

Thanks for your quick reply Chris.

I'm reviewing my options for performing the update to BIOS, Drac, and Perc 6/i that you recommend.  I will probably schedule for this weekend.

Thank you also for the great picture detailing the Perc 6/i and Riser 1.

When you mention the option of running without the controller, I'm assuming that means I will not be able to access the RAID volume correct?  If that's the case then there's not really much use as we are only using local storage on this host so without access to the RAID volume then the server can't perform any tasks.

Currently the error has not come up again.  You mention options based on what continues to produce the error codes we saw previously.  What is your interpretation of the lack of ongoing errors?  Do these things happen from time to time without serious consequence or is this an indication that I have a serious hardware issue that needs to be addressed. 

Thanks!

Scott

Moderator

 • 

8.6K Posts

February 4th, 2015 07:00

Sorry, when I was referring to "running without the Perc"I was meaning for troubleshooting purposes. It was to test both the riser w/ controller, as well as without the controller. This helps identify if the issue was a bad riser.

With the error not being repetitive, it would lead me to believe it is either due to the system being out of date on updates, or a poor connection between the controller-riser-motherboard line. So reseating would resolve that, and the updates the other. 

I would update and see before getting too worried. If it repeats let me know.

February 9th, 2015 11:00

Hi Chris

I tried to update Bios, Perc, and Drac as recommended this weekend but ran into some trouble.

I tried to use LifeCycle by hitting F10 at boot but when this started to load it generated a script error.  I had created a bootable DVD SBUU so I decided to go that route however when I tried to have it access the repository I created on a separate DVD for SUU it wouldn't recognize the repository.

I decided to try repairing LifeCycle by following the instructions in the doc Repairing or Updating the Lifecycle Controller (LCC) and Unified Server Configurator (USC) linked at http://www.dell.com/support/article/us/en/19/SLN85572/EN

This worked and I was able to get LifeCycle to load.  I had it connect to ftp.dell.com but got the message that no updates are available, even though I can see that I do not have the latest BIOS, Perc6/i, or DRAC.  I then tried using LifeCycle to update by using the SUU I had created.  I tried to apply all updates but after an hour nothing was happening.  Since there was no disc or network activity I rebooted and went back to LifeCycle.  This time I tried to just run BIOS and lifecycle.  The lifecycle update ran but when the BIOS update tried to run I get the message "Return code mismatch for BiosWrapper.efi - (0xb0000001) !"  

I tried a few more times but I still got return code mismatch errors.  At this point I gave up.  

What is the best way for me to update BIOS, DRAC, and Perc6/i?  I would like to use LifeCycle and ftp but it says there are no updates.  The repository I can create to DVD does not appear to be valid.  

Thanks,

Scott

Moderator

 • 

8.6K Posts

February 10th, 2015 06:00

If this is windows based then run the "Windows Update Package" version of the updates from the OS, off the links I provided earlier. They will launch the individual updates from the OS, update and ask to reboot. Once the server reboots the update should have occurred. 

February 10th, 2015 09:00

Hi Chris

The server runs ESXi 5.1  That's why I'm not able to use the Windows Update Packages.  Is there an up to date set of files that I can access either by creating a repository or from ftp?

Thanks,

Scott

Moderator

 • 

8.6K Posts

February 10th, 2015 10:00

1more,

Download the Repository Manager. With that you can then extract the repository from the SUU. You can then take that repository and use it, via USB, either in the Lifecycle Controller - Platform Update, or from booting to the BUU, and selecting the same.

February 10th, 2015 10:00

Hi Chris

This is pretty much what I tried, however I kept getting return code mismatch errors.  This makes me think that the files generated from the repository are not valid?  Do you know how I can resolve the return code errors?

I've attached a screenshot for reference.

Thanks,

Scott

1 Attachment

No Events found!

Top