Unsolved

This post is more than 5 years old

1 Message

10602

October 14th, 2005 18:00

Watchdog timer and linux on PE1850

Where can I get some information on using the hardware watchdog timer (which I think does exist) in a RedHat Enterprise Linux 3.0 environment on a PE 1850 or PE SC1425? I think the timer is related to the BMC function on the box, but I can't find any specific documentation. I'd like the box to auto-reboot if my application goes out to lunch and doesn't refresh the time.

Any pointers are welcome.

Thanks

2 Intern

 • 

815 Posts

October 17th, 2005 18:00

This may help you
cd /dev
./MAKEDEV watchdog
The NMI watchdog is enabled on supported systems by adding nmi_watchdog=1 
to the kernel's command line. 
referenced from:
http://www.redhat.com/docs/manuals/enterprise/RHEL-3-Manual/cluster-suite/ap-hwinfo.html

October 17th, 2005 23:00

Thanks for the reply and the link on the RHEL site.

In the section on Configuring a Hardware Watchdog Time D.1.2.3, it mentions it's hard to tell if a system has a hardware timer and I was wondering if the Dell PE1850 does, and if so, what value for the alias command I should use for it would be.

Thanks

2 Intern

 • 

815 Posts

October 18th, 2005 11:00

once you have the watchdog device and nmi_watchdog=1 in grub or lilo, you should reboot.  Then cat /proc/interupts.  You should see a line for NMI.  If NMI has a number next to it, then the hardware watch dog is enabled. (the number will vary).  The 1850 as well as all rack mount poweredge servers have a hardware watchdog.

2 Intern

 • 

815 Posts

October 18th, 2005 13:00

If you need more information on how to use this feature, you should review the kernel documentation.  nmi_watchdog is fully documented there.
 
From the nmi_watchdog kernel documentation. 
 
  http://fxr.watson.org/fxr/source/Documentation/nmi_watchdog.txt?v=linux-2.6.9
NMI: Non Maskable Interrupt
 which get executed even if the system is otherwise locked up hard).
 This can be used to debug hard kernel lockups.  By executing periodic
 NMI interrupts, the kernel can monitor whether any CPU has locked up,
 and print out debugging messages if so.

October 18th, 2005 13:00

From your description, would it be accurate to say that the operating system is the "program" that controls the timer? What I want to do is have my custom application control the refresh/reset of the timer in case it hangs because alot of times the OS is fine, but the application hangs, and that's what I want to make sure is running.

That being the case, are there any docs on how to directly access and control the timer from a user application?

Thanks

Message Edited by emfjsullivan on 10-18-2005 10:08 AM

No Events found!

Top