메인 콘텐츠로 이동
  • 빠르고 간편하게 주문
  • 주문 보기 및 배송 상태 추적
  • 제품 목록을 생성 및 액세스
  • 회사 관리를 사용하여 Dell EMC 사이트, 제품 및 제품 수준 연락처를 관리하십시오.
일부 문서 번호가 변경되었을 수 있습니다. 이 문서가 찾으려는 문서가 아닌 경우 모든 문서를 검색해 보십시오. 문서 검색

Using Systemd for Automated System Recovery

요약: With the inclusion of support for watchdog hardware, systemd can now perform the function of a watchdog daemon Linux on Dell PowerEdge systems.

이 문서는 자동으로 번역되었을 수 있습니다. 번역 품질에 대한 의견이 있는 경우 페이지 하단의 양식을 사용해 알려 주시기 바랍니다.

문서 콘텐츠


지침

With the inclusion This hyperlink is taking you to a website outside of Dell Technologies of support for watchdog This hyperlink is taking you to a website outside of Dell Technologies hardware, systemd This hyperlink is taking you to a website outside of Dell Technologies can now perform the function of a watchdog daemon Linux. On Dell PowerEdge systems, this hardware could either be the chipset watchdog timer built into the platform’s chipset (like Intel ICH9) or Dell iDRAC’s IPMI-compliant BMC watchdog timer.

Dell iDRAC provides Automated System Recovery which, in addition to recovering from operating system lock-ups, can capture a screenshot for analysis later. It was necessary to additional software on the operating system to enable this. With newer distributions supporting systemd, this feature works with software available natively in a distribution, eliminating the need for add-on software.

It was however possible to use the watchdogd daemon on Linux, but there was a probability where the daemon itself could lock-up while the rest of the system was operational. systemd acts as the software watchdog for all system services and the BMC watchdog timer acts as the hardware watchdog for systemd itself. So if systemd is nonoperational, there is a good chance that the system is unusable in general. So we now have a more reliable method for all system services, the manager of the services (systemd) to be 'watched' by the BMC’s watchdog timer.

The glue between systemd and Dell iDRAC’s BMC watchdog is the ipmi_watchdog kernel module, which provides Linux watchdog API This hyperlink is taking you to a website outside of Dell Technologies access to the BMC watchdog using /dev/watchdog. Systemd uses this interface to kick the watchdog periodically.

Setting up systemd with ipmi_watchdog

Systemd can be configured to use iDRAC BMC watchdog with these steps (on Fedora 19):

  1. Since the system has two watchdog timers (chipset and BMC), we can use either of them. For this example, we disable the chipset watchdog. The chipset watchdog can be disabled by setting the "operating system Watchdog Timer" option in the System BIOS to "Disabled" (default).
  2. Arrive at a timeout value for the watchdog, say 180 seconds.
  3. Enable the ipmi_watchdog kernel module to load at system startup with the timeout from above:
  • Method 1: Create /etc/modules-load.d/ipmi_watchdog with the following content
    • Options ipmi_watchdog timeout=180
    • Deny list iTCO_wdt  # Optional. If the chipset watchdog is not disabled in BIOS setup.
  • Method 2:
    • Install OpenIPMI rpm
      • $ sudo yum install OpenIPMI
    • Set IPMI_WATCHDOG=yes and IPMI_WATCHDOG_OPTIONS with the timeout in /etc/sysconfig/ipmi.
    • Enable the ipmi service to startup automatically
      • $ sudo systemctl enable ipmi
  • Enable systemd’s watchdog:
    • Uncomment and set RuntimeWatchdogSec=180 in /etc/systemd/system.conf
  • Restart systemd
    • # systemctl daemon-reexec

Test if this works:

  1. Check if the watchdog is active
    • $ sudo journalctl |grep -i 'hardware watchdog'   # should show that systemd is set up to use IPMI watchdog.
    • $ sudo ipmitool mc watchdog get        # check if the "Watchdog Timer Is: Started/Running."
  2. Test by simulating a Kernel Panic (do not do this on a production system). Ensure that kdump is disabled.
    • $ sudo echo c > /proc/sysrq-trigger
  3. After system reset, verify that an image of the failure screen is available in the iDRAC
    • Log in to iDRAC web UI
    • Overview -> Server -> Troubleshooting -> Last Crash Screen.
NOTE: This feature is unsupported by Dell currently and is shared here with the intent of soliciting feedback from the community at Linux Resources for PowerEdge Servers.

문서 속성


마지막 게시 날짜

19 9월 2023

버전

6

문서 유형

How To