| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1 | The Linux Watchdog driver API. | 
 | 2 |  | 
 | 3 | Copyright 2002 Christer Weingel <wingel@nano-system.com> | 
 | 4 |  | 
 | 5 | Some parts of this document are copied verbatim from the sbc60xxwdt | 
 | 6 | driver which is (c) Copyright 2000 Jakob Oestergaard <jakob@ostenfeld.dk> | 
 | 7 |  | 
 | 8 | This document describes the state of the Linux 2.4.18 kernel. | 
 | 9 |  | 
 | 10 | Introduction: | 
 | 11 |  | 
 | 12 | A Watchdog Timer (WDT) is a hardware circuit that can reset the | 
 | 13 | computer system in case of a software fault.  You probably knew that | 
 | 14 | already. | 
 | 15 |  | 
 | 16 | Usually a userspace daemon will notify the kernel watchdog driver via the | 
 | 17 | /dev/watchdog special device file that userspace is still alive, at | 
 | 18 | regular intervals.  When such a notification occurs, the driver will | 
 | 19 | usually tell the hardware watchdog that everything is in order, and | 
 | 20 | that the watchdog should wait for yet another little while to reset | 
 | 21 | the system.  If userspace fails (RAM error, kernel bug, whatever), the | 
 | 22 | notifications cease to occur, and the hardware watchdog will reset the | 
 | 23 | system (causing a reboot) after the timeout occurs. | 
 | 24 |  | 
 | 25 | The Linux watchdog API is a rather AD hoc construction and different | 
 | 26 | drivers implement different, and sometimes incompatible, parts of it. | 
 | 27 | This file is an attempt to document the existing usage and allow | 
 | 28 | future driver writers to use it as a reference. | 
 | 29 |  | 
 | 30 | The simplest API: | 
 | 31 |  | 
 | 32 | All drivers support the basic mode of operation, where the watchdog | 
 | 33 | activates as soon as /dev/watchdog is opened and will reboot unless | 
 | 34 | the watchdog is pinged within a certain time, this time is called the | 
 | 35 | timeout or margin.  The simplest way to ping the watchdog is to write | 
 | 36 | some data to the device.  So a very simple watchdog daemon would look | 
 | 37 | like this: | 
 | 38 |  | 
 | 39 | int main(int argc, const char *argv[]) { | 
 | 40 | 	int fd=open("/dev/watchdog",O_WRONLY); | 
 | 41 | 	if (fd==-1) { | 
 | 42 | 		perror("watchdog"); | 
 | 43 | 		exit(1); | 
 | 44 | 	} | 
 | 45 | 	while(1) { | 
 | 46 | 		write(fd, "\0", 1); | 
 | 47 | 		sleep(10); | 
 | 48 | 	} | 
 | 49 | } | 
 | 50 |  | 
 | 51 | A more advanced driver could for example check that a HTTP server is | 
 | 52 | still responding before doing the write call to ping the watchdog. | 
 | 53 |  | 
 | 54 | When the device is closed, the watchdog is disabled.  This is not | 
 | 55 | always such a good idea, since if there is a bug in the watchdog | 
 | 56 | daemon and it crashes the system will not reboot.  Because of this, | 
 | 57 | some of the drivers support the configuration option "Disable watchdog | 
 | 58 | shutdown on close", CONFIG_WATCHDOG_NOWAYOUT.  If it is set to Y when | 
 | 59 | compiling the kernel, there is no way of disabling the watchdog once | 
 | 60 | it has been started.  So, if the watchdog dameon crashes, the system | 
 | 61 | will reboot after the timeout has passed. | 
 | 62 |  | 
 | 63 | Some other drivers will not disable the watchdog, unless a specific | 
 | 64 | magic character 'V' has been sent /dev/watchdog just before closing | 
 | 65 | the file.  If the userspace daemon closes the file without sending | 
 | 66 | this special character, the driver will assume that the daemon (and | 
 | 67 | userspace in general) died, and will stop pinging the watchdog without | 
 | 68 | disabling it first.  This will then cause a reboot. | 
 | 69 |  | 
 | 70 | The ioctl API: | 
 | 71 |  | 
 | 72 | All conforming drivers also support an ioctl API. | 
 | 73 |  | 
 | 74 | Pinging the watchdog using an ioctl: | 
 | 75 |  | 
 | 76 | All drivers that have an ioctl interface support at least one ioctl, | 
 | 77 | KEEPALIVE.  This ioctl does exactly the same thing as a write to the | 
 | 78 | watchdog device, so the main loop in the above program could be | 
 | 79 | replaced with: | 
 | 80 |  | 
 | 81 | 	while (1) { | 
 | 82 | 		ioctl(fd, WDIOC_KEEPALIVE, 0); | 
 | 83 | 		sleep(10); | 
 | 84 | 	} | 
 | 85 |  | 
 | 86 | the argument to the ioctl is ignored. | 
 | 87 |  | 
 | 88 | Setting and getting the timeout: | 
 | 89 |  | 
 | 90 | For some drivers it is possible to modify the watchdog timeout on the | 
 | 91 | fly with the SETTIMEOUT ioctl, those drivers have the WDIOF_SETTIMEOUT | 
 | 92 | flag set in their option field.  The argument is an integer | 
 | 93 | representing the timeout in seconds.  The driver returns the real | 
 | 94 | timeout used in the same variable, and this timeout might differ from | 
 | 95 | the requested one due to limitation of the hardware. | 
 | 96 |  | 
 | 97 |     int timeout = 45; | 
 | 98 |     ioctl(fd, WDIOC_SETTIMEOUT, &timeout); | 
 | 99 |     printf("The timeout was set to %d seconds\n", timeout); | 
 | 100 |  | 
 | 101 | This example might actually print "The timeout was set to 60 seconds" | 
 | 102 | if the device has a granularity of minutes for its timeout. | 
 | 103 |  | 
 | 104 | Starting with the Linux 2.4.18 kernel, it is possible to query the | 
 | 105 | current timeout using the GETTIMEOUT ioctl. | 
 | 106 |  | 
 | 107 |     ioctl(fd, WDIOC_GETTIMEOUT, &timeout); | 
 | 108 |     printf("The timeout was is %d seconds\n", timeout); | 
 | 109 |  | 
 | 110 | Envinronmental monitoring: | 
 | 111 |  | 
 | 112 | All watchdog drivers are required return more information about the system, | 
 | 113 | some do temperature, fan and power level monitoring, some can tell you | 
 | 114 | the reason for the last reboot of the system.  The GETSUPPORT ioctl is | 
 | 115 | available to ask what the device can do: | 
 | 116 |  | 
 | 117 | 	struct watchdog_info ident; | 
 | 118 | 	ioctl(fd, WDIOC_GETSUPPORT, &ident); | 
 | 119 |  | 
 | 120 | the fields returned in the ident struct are: | 
 | 121 |  | 
 | 122 |         identity		a string identifying the watchdog driver | 
 | 123 | 	firmware_version	the firmware version of the card if available | 
 | 124 | 	options			a flags describing what the device supports | 
 | 125 |  | 
 | 126 | the options field can have the following bits set, and describes what | 
 | 127 | kind of information that the GET_STATUS and GET_BOOT_STATUS ioctls can | 
 | 128 | return.   [FIXME -- Is this correct?] | 
 | 129 |  | 
 | 130 | 	WDIOF_OVERHEAT		Reset due to CPU overheat | 
 | 131 |  | 
 | 132 | The machine was last rebooted by the watchdog because the thermal limit was | 
 | 133 | exceeded | 
 | 134 |  | 
 | 135 | 	WDIOF_FANFAULT		Fan failed | 
 | 136 |  | 
 | 137 | A system fan monitored by the watchdog card has failed | 
 | 138 |  | 
 | 139 | 	WDIOF_EXTERN1		External relay 1 | 
 | 140 |  | 
 | 141 | External monitoring relay/source 1 was triggered. Controllers intended for | 
 | 142 | real world applications include external monitoring pins that will trigger | 
 | 143 | a reset. | 
 | 144 |  | 
 | 145 | 	WDIOF_EXTERN2		External relay 2 | 
 | 146 |  | 
 | 147 | External monitoring relay/source 2 was triggered | 
 | 148 |  | 
 | 149 | 	WDIOF_POWERUNDER	Power bad/power fault | 
 | 150 |  | 
 | 151 | The machine is showing an undervoltage status | 
 | 152 |  | 
 | 153 | 	WDIOF_CARDRESET		Card previously reset the CPU | 
 | 154 |  | 
 | 155 | The last reboot was caused by the watchdog card | 
 | 156 |  | 
 | 157 | 	WDIOF_POWEROVER		Power over voltage | 
 | 158 |  | 
 | 159 | The machine is showing an overvoltage status. Note that if one level is | 
 | 160 | under and one over both bits will be set - this may seem odd but makes | 
 | 161 | sense. | 
 | 162 |  | 
 | 163 | 	WDIOF_KEEPALIVEPING	Keep alive ping reply | 
 | 164 |  | 
 | 165 | The watchdog saw a keepalive ping since it was last queried. | 
 | 166 |  | 
 | 167 | 	WDIOF_SETTIMEOUT	Can set/get the timeout | 
 | 168 |  | 
 | 169 |  | 
 | 170 | For those drivers that return any bits set in the option field, the | 
 | 171 | GETSTATUS and GETBOOTSTATUS ioctls can be used to ask for the current | 
 | 172 | status, and the status at the last reboot, respectively.   | 
 | 173 |  | 
 | 174 |     int flags; | 
 | 175 |     ioctl(fd, WDIOC_GETSTATUS, &flags); | 
 | 176 |  | 
 | 177 |     or | 
 | 178 |  | 
 | 179 |     ioctl(fd, WDIOC_GETBOOTSTATUS, &flags); | 
 | 180 |  | 
 | 181 | Note that not all devices support these two calls, and some only | 
 | 182 | support the GETBOOTSTATUS call. | 
 | 183 |  | 
 | 184 | Some drivers can measure the temperature using the GETTEMP ioctl.  The | 
 | 185 | returned value is the temperature in degrees farenheit. | 
 | 186 |  | 
 | 187 |     int temperature; | 
 | 188 |     ioctl(fd, WDIOC_GETTEMP, &temperature); | 
 | 189 |  | 
 | 190 | Finally the SETOPTIONS ioctl can be used to control some aspects of | 
 | 191 | the cards operation; right now the pcwd driver is the only one | 
 | 192 | supporting thiss ioctl. | 
 | 193 |  | 
 | 194 |     int options = 0; | 
 | 195 |     ioctl(fd, WDIOC_SETOPTIONS, options); | 
 | 196 |  | 
 | 197 | The following options are available: | 
 | 198 |  | 
 | 199 | 	WDIOS_DISABLECARD	Turn off the watchdog timer | 
 | 200 | 	WDIOS_ENABLECARD	Turn on the watchdog timer | 
 | 201 | 	WDIOS_TEMPPANIC		Kernel panic on temperature trip | 
 | 202 |  | 
 | 203 | [FIXME -- better explanations] | 
 | 204 |  | 
 | 205 | Implementations in the current drivers in the kernel tree: | 
 | 206 |  | 
 | 207 | Here I have tried to summarize what the different drivers support and | 
 | 208 | where they do strange things compared to the other drivers. | 
 | 209 |  | 
 | 210 | acquirewdt.c -- Acquire Single Board Computer | 
 | 211 |  | 
 | 212 | 	This driver has a hardcoded timeout of 1 minute | 
 | 213 |  | 
 | 214 | 	Supports CONFIG_WATCHDOG_NOWAYOUT | 
 | 215 |  | 
 | 216 | 	GETSUPPORT returns KEEPALIVEPING.  GETSTATUS will return 1 if | 
 | 217 | 	the device is open, 0 if not.  [FIXME -- isn't this rather | 
 | 218 | 	silly?  To be able to use the ioctl, the device must be open | 
 | 219 | 	and so GETSTATUS will always return 1]. | 
 | 220 |  | 
 | 221 | advantechwdt.c -- Advantech Single Board Computer | 
 | 222 |  | 
 | 223 | 	Timeout that defaults to 60 seconds, supports SETTIMEOUT. | 
 | 224 |  | 
 | 225 | 	Supports CONFIG_WATCHDOG_NOWAYOUT | 
 | 226 |  | 
 | 227 | 	GETSUPPORT returns WDIOF_KEEPALIVEPING and WDIOF_SETTIMEOUT. | 
 | 228 | 	The GETSTATUS call returns if the device is open or not. | 
 | 229 | 	[FIXME -- silliness again?] | 
 | 230 | 	 | 
| Kumar Gala | a2f40cc | 2005-09-03 15:55:33 -0700 | [diff] [blame] | 231 | booke_wdt.c -- PowerPC BookE Watchdog Timer | 
 | 232 |  | 
 | 233 | 	Timeout default varies according to frequency, supports | 
 | 234 | 	SETTIMEOUT | 
 | 235 |  | 
 | 236 | 	Watchdog can not be turned off, CONFIG_WATCHDOG_NOWAYOUT | 
 | 237 | 	does not make sense | 
 | 238 |  | 
 | 239 | 	GETSUPPORT returns the watchdog_info struct, and | 
 | 240 | 	GETSTATUS returns the supported options. GETBOOTSTATUS | 
 | 241 | 	returns a 1 if the last reset was caused by the | 
 | 242 | 	watchdog and a 0 otherwise. This watchdog can not be | 
 | 243 | 	disabled once it has been started. The wdt_period kernel | 
 | 244 | 	parameter selects which bit of the time base changing | 
 | 245 | 	from 0->1 will trigger the watchdog exception. Changing | 
 | 246 | 	the timeout from the ioctl calls will change the | 
 | 247 | 	wdt_period as defined above. Finally if you would like to | 
 | 248 | 	replace the default Watchdog Handler you can implement the | 
 | 249 | 	WatchdogHandler() function in your own code. | 
 | 250 |  | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 251 | eurotechwdt.c -- Eurotech CPU-1220/1410 | 
 | 252 |  | 
 | 253 | 	The timeout can be set using the SETTIMEOUT ioctl and defaults | 
 | 254 | 	to 60 seconds. | 
 | 255 |  | 
 | 256 | 	Also has a module parameter "ev", event type which controls | 
 | 257 | 	what should happen on a timeout, the string "int" or anything | 
 | 258 | 	else that causes a reboot.  [FIXME -- better description] | 
 | 259 |  | 
 | 260 | 	Supports CONFIG_WATCHDOG_NOWAYOUT | 
 | 261 |  | 
 | 262 | 	GETSUPPORT returns CARDRESET and WDIOF_SETTIMEOUT but | 
 | 263 | 	GETSTATUS is not supported and GETBOOTSTATUS just returns 0. | 
 | 264 |  | 
 | 265 | i810-tco.c -- Intel 810 chipset | 
 | 266 |  | 
 | 267 | 	Also has support for a lot of other i8x0 stuff, but the | 
 | 268 | 	watchdog is one of the things. | 
 | 269 |  | 
 | 270 | 	The timeout is set using the module parameter "i810_margin", | 
 | 271 | 	which is in steps of 0.6 seconds where 2<i810_margin<64.  The | 
 | 272 | 	driver supports the SETTIMEOUT ioctl. | 
 | 273 |  | 
 | 274 | 	Supports CONFIG_WATCHDOG_NOWAYOUT. | 
 | 275 |  | 
 | 276 | 	GETSUPPORT returns WDIOF_SETTIMEOUT.  The GETSTATUS call | 
 | 277 | 	returns some kind of timer value which ist not compatible with | 
 | 278 | 	the other drivers.  GETBOOT status returns some kind of | 
 | 279 | 	hardware specific boot status.  [FIXME -- describe this] | 
 | 280 |  | 
 | 281 | ib700wdt.c -- IB700 Single Board Computer | 
 | 282 |  | 
 | 283 | 	Default timeout of 30 seconds and the timeout is settable | 
 | 284 | 	using the SETTIMEOUT ioctl.  Note that only a few timeout | 
 | 285 | 	values are supported. | 
 | 286 |  | 
 | 287 | 	Supports CONFIG_WATCHDOG_NOWAYOUT | 
 | 288 |  | 
 | 289 | 	GETSUPPORT returns WDIOF_KEEPALIVEPING and WDIOF_SETTIMEOUT. | 
 | 290 | 	The GETSTATUS call returns if the device is open or not. | 
 | 291 | 	[FIXME -- silliness again?] | 
 | 292 |  | 
 | 293 | machzwd.c -- MachZ ZF-Logic | 
 | 294 |  | 
 | 295 | 	Hardcoded timeout of 10 seconds | 
 | 296 |  | 
 | 297 | 	Has a module parameter "action" that controls what happens | 
 | 298 | 	when the timeout runs out which can be 0 = RESET (default),  | 
 | 299 | 	1 = SMI, 2 = NMI, 3 = SCI. | 
 | 300 |  | 
 | 301 | 	Supports CONFIG_WATCHDOG_NOWAYOUT and the magic character | 
 | 302 | 	'V' close handling. | 
 | 303 |  | 
 | 304 | 	GETSUPPORT returns WDIOF_KEEPALIVEPING, and the GETSTATUS call | 
 | 305 | 	returns if the device is open or not.  [FIXME -- silliness | 
 | 306 | 	again?] | 
 | 307 |  | 
 | 308 | mixcomwd.c -- MixCom Watchdog | 
 | 309 |  | 
 | 310 | 	[FIXME -- I'm unable to tell what the timeout is] | 
 | 311 |  | 
 | 312 | 	Supports CONFIG_WATCHDOG_NOWAYOUT | 
 | 313 |  | 
 | 314 | 	GETSUPPORT returns WDIOF_KEEPALIVEPING, GETSTATUS returns if | 
 | 315 | 	the device is opened or not [FIXME -- I'm not really sure how | 
 | 316 | 	this works, there seems to be some magic connected to | 
 | 317 | 	CONFIG_WATCHDOG_NOWAYOUT] | 
 | 318 |  | 
 | 319 | pcwd.c -- Berkshire PC Watchdog | 
 | 320 |  | 
 | 321 | 	Hardcoded timeout of 1.5 seconds | 
 | 322 |  | 
 | 323 | 	Supports CONFIG_WATCHDOG_NOWAYOUT | 
 | 324 |  | 
 | 325 | 	GETSUPPORT returns WDIOF_OVERHEAT|WDIOF_CARDRESET and both | 
 | 326 | 	GETSTATUS and GETBOOTSTATUS return something useful. | 
 | 327 |  | 
 | 328 | 	The SETOPTIONS call can be used to enable and disable the card | 
 | 329 | 	and to ask the driver to call panic if the system overheats. | 
 | 330 |  | 
 | 331 | sbc60xxwdt.c -- 60xx Single Board Computer | 
 | 332 |  | 
 | 333 | 	Hardcoded timeout of 10 seconds | 
 | 334 |  | 
 | 335 | 	Does not support CONFIG_WATCHDOG_NOWAYOUT, but has the magic | 
 | 336 | 	character 'V' close handling. | 
 | 337 |  | 
 | 338 | 	No bits set in GETSUPPORT | 
 | 339 |  | 
 | 340 | scx200.c -- National SCx200 CPUs | 
 | 341 |  | 
 | 342 | 	Not in the kernel yet. | 
 | 343 |  | 
 | 344 | 	The timeout is set using a module parameter "margin" which | 
 | 345 | 	defaults to 60 seconds.  The timeout can also be set using | 
 | 346 | 	SETTIMEOUT and read using GETTIMEOUT. | 
 | 347 |  | 
 | 348 | 	Supports a module parameter "nowayout" that is initialized | 
 | 349 | 	with the value of CONFIG_WATCHDOG_NOWAYOUT.  Also supports the | 
 | 350 | 	magic character 'V' handling. | 
 | 351 |  | 
 | 352 | shwdt.c -- SuperH 3/4 processors | 
 | 353 |  | 
 | 354 | 	[FIXME -- I'm unable to tell what the timeout is] | 
 | 355 |  | 
 | 356 | 	Supports CONFIG_WATCHDOG_NOWAYOUT | 
 | 357 |  | 
 | 358 | 	GETSUPPORT returns WDIOF_KEEPALIVEPING, and the GETSTATUS call | 
 | 359 | 	returns if the device is open or not.  [FIXME -- silliness | 
 | 360 | 	again?] | 
 | 361 |  | 
 | 362 | softdog.c -- Software watchdog | 
 | 363 |  | 
 | 364 | 	The timeout is set with the module parameter "soft_margin" | 
 | 365 | 	which defaults to 60 seconds, the timeout is also settable | 
 | 366 | 	using the SETTIMEOUT ioctl. | 
 | 367 |  | 
 | 368 | 	Supports CONFIG_WATCHDOG_NOWAYOUT | 
 | 369 |  | 
 | 370 | 	WDIOF_SETTIMEOUT bit set in GETSUPPORT | 
 | 371 |  | 
 | 372 | w83877f_wdt.c -- W83877F Computer | 
 | 373 |  | 
 | 374 | 	Hardcoded timeout of 30 seconds | 
 | 375 |  | 
 | 376 | 	Does not support CONFIG_WATCHDOG_NOWAYOUT, but has the magic | 
 | 377 | 	character 'V' close handling. | 
 | 378 |  | 
 | 379 | 	No bits set in GETSUPPORT | 
 | 380 |  | 
 | 381 | w83627hf_wdt.c -- w83627hf watchdog | 
 | 382 |  | 
 | 383 | 	Timeout that defaults to 60 seconds, supports SETTIMEOUT. | 
 | 384 |  | 
 | 385 | 	Supports CONFIG_WATCHDOG_NOWAYOUT | 
 | 386 |  | 
 | 387 | 	GETSUPPORT returns WDIOF_KEEPALIVEPING and WDIOF_SETTIMEOUT. | 
 | 388 | 	The GETSTATUS call returns if the device is open or not. | 
 | 389 |  | 
 | 390 | wdt.c -- ICS WDT500/501 ISA and | 
 | 391 | wdt_pci.c -- ICS WDT500/501 PCI | 
 | 392 |  | 
 | 393 | 	Default timeout of 60 seconds.  The timeout is also settable | 
 | 394 |         using the SETTIMEOUT ioctl. | 
 | 395 |  | 
 | 396 | 	Supports CONFIG_WATCHDOG_NOWAYOUT | 
 | 397 |  | 
 | 398 | 	GETSUPPORT returns with bits set depending on the actual | 
 | 399 | 	card. The WDT501 supports a lot of external monitoring, the | 
 | 400 | 	WDT500 much less. | 
 | 401 |  | 
 | 402 | wdt285.c -- Footbridge watchdog | 
 | 403 |  | 
 | 404 | 	The timeout is set with the module parameter "soft_margin" | 
 | 405 | 	which defaults to 60 seconds.  The timeout is also settable | 
 | 406 | 	using the SETTIMEOUT ioctl. | 
 | 407 |  | 
 | 408 | 	Does not support CONFIG_WATCHDOG_NOWAYOUT | 
 | 409 |  | 
 | 410 | 	WDIOF_SETTIMEOUT bit set in GETSUPPORT | 
 | 411 |  | 
 | 412 | wdt977.c -- Netwinder W83977AF chip | 
 | 413 |  | 
 | 414 | 	Hardcoded timeout of 3 minutes | 
 | 415 |  | 
 | 416 | 	Supports CONFIG_WATCHDOG_NOWAYOUT | 
 | 417 |  | 
 | 418 | 	Does not support any ioctls at all. | 
 | 419 |  |