Hi. Welcome to our iDRAC training series. In this video, we'll be covering how to troubleshoot connectivity problems with the iDRAC. The first troubleshooting step starts with asking a few basic questions. Let's take a look at the ones below. Can you ping the iDRAC? This will let us know if we have that very basic connectivity. We should also keep in mind that ping is a requirement by many monitoring tools. An example would be if discovery of the device fails, but you can access the web interface. Next would be, do other remote protocols respond? This would be things like PuTTY or another SSH software.
This would be helpful to know if, say, the web interface is not responding. Next would be, are the fans spinning at 100%? You may get a customer complaining that the server suddenly has very loud fans. That can lead us to our next question, which would be, do we get an error on post about initialization? If we see this message, that is a clear indicator that we have some kind of internal issue with the iDRAC. If we can get to the iDRAC web page, one of the first things to try is an iDRAC reset, as a reboot will often resolve a lot of issues. If we can't get to the web page and we are getting initialization errors, we need to have the customer drain flea power.
To do this, we need to unplug the server from power for about five minutes. Plug the system back in and check to see if we still have initialization errors or problems with the iDRAC. The next troubleshooting steps would be to check the iDRAC configuration. The first thing we'll try is connecting to the iDRAC CLI and running "racadm getniccfg". Here, we can see the link-detected status, which NIC is in use, as well as the IP configuration. Next, run the command "get idrac.nic.vlanenable" to see if a VLAN may be incorrectly configured. Next, let's take a look at the BIOS menu settings for the iDRAC.
From the F2 "System Setup" menu, click on "iDRAC Settings". From the "Settings" menu, click on the "Network" section. Here, we can view the same settings as before from the SSH console. We can configure the NICs selection, as well as scroll down to see the IP config. If we scroll to the bottom, we can also see VLAN settings and verify that they are configured correctly. The next option is to collect a TSR. Otherwise known as the SupportAssist Collection. Click on "Maintenance", "Support"Assist", then "Start a Collection". On the data to collect, it is often helpful to ensure the "Debug Logs" box is checked. Once your selections are complete, click the "Collect" button. This will take a few minutes to complete. If "Debug Logs" were checked, this could take 10 to 15 minutes to finish.
Once complete, save the bundle to the desktop. Once the bundle has been downloaded, drag it to the Tesseract program, or you can extract and view the HTML inside. Click on "Summary" and choose the "Raw" option. From here, we can do a search for "ipv4". Now, we can check and see what IP address is listed, either under "Static" or "DHCP". We can confirm the IP address, the gateway, and the netmask. Now, let's try searching for "nic.1". We can check for the active NIC section and see if it's enabled. If we scroll down, we can also check the VLAN ID and see if the VLAN is enabled. If the config looks good or if the active NIC does not match the NICs selection, reboot the iDRAC by either running the RACADM command "racreset" or hold the "i" button for 15 seconds if someone is on site.
Now, let's look at some key steps to narrow down the issue. First would be, can the iDRAC ping the gateway? To test this, you could connect to the iDRAC through SSH and run the command "racadm ping", and then type the appropriate address needed. Next, does the iDRAC work from a second system? Often, the issue is with the local workstation we are trying to connect from. It could be a browser issue, a permissions issue, or other network problems. Something else to ask is, are the link lights active on the iDRAC port? You would have to have someone on site to verify this, but it can help confirm if the NIC is showing any signs of power or connectivity. What happens if we change the iDRAC port? It's possible that one of the ports has simply failed.
By changing which port the iDRAC communicates on, we can confirm if the issue is isolated to a single network port, or if there's something else going on with the system. Last on our list, does iDRAC Direct or plugging directly into a laptop provide access? Often we are faced with issues beyond our control and in the customers' network. It could be a routing issue or a firewall between sites blocking communication. If we cannot get to the iDRAC from a remote system, we need to try testing locally through a direct connection to the iDRAC.
If needed, we could also reset the iDRAC or do further testing while on site. I'll start off this last section by covering a less-seen issue where the iDRAC web server is not enabled. This can sometimes happen when the iDRAC is misconfigured in an attempt to increase network security. Connect to the iDRAC through SSH and type the RACADM command "get idrac.webserver". We can see from the output that the "Enable" attribute is set to "Disabled". That is why SSH works, but the web browser does not. To resolve this, type the RACADM command set idrac.webserver.enable enabled. Now, we can see that the web server is enabled. If we refresh our browser, we can now connect to the web page. Now, let's take a look at some of our last-resort options.
If we are left with wiping the config of the iDRAC, but we still have connectivity to it, then we can run the RACADM command "Racresetcfg". Now, if we have a 13th generation system, this will reset everything to factory settings. If we are looking at the more current 14th generation, it will reset everything except for the network and root user accounts. If you would like to do this, that is when you will need to add the optional "-rc" switch to the end. The next option we have is to power-cycle the server and then choose the F10 boot option so that we boot to the Lifecycle Controller. Once there, we need to choose the "Hardware Configuration" menu. Then choose "Repurpose or Retire System". From here, we can see that there are multiple options to be reset.
We can just choose the "IDRAC" one and click "Next". The third option we have is to boot to the F2 BIOS menu. From here, we choose "iDRAC Settings". We then scroll to the bottom of the page. This is a 14th generation system, so we see the two different reset options. This last option is to utilize the LC wipe method. This is essentially a retire and repurpose that is sent to the iDRAC remotely. All systems will be reset to factory settings, not just the iDRAC. A backup of the customer's configuration and a downloaded copy of their iDRAC license will be needed.
It would be wise to always have this data when utilizing a reset, but specifically with the LC wipe option, it will delete the license from the system. In order for this to work, we need a window system that can reach the iDRAC remotely. Type the below winrm command. You will need to replace the items in brackets with the correct username, password, and iDRAC IP. Once this is sent to the iDRAC, it will take approximately ten minutes for the system to finish resetting and rebooting the host.
Thank you for taking the time to review this video on troubleshooting connectivity for the iDRAC.