blob: d738ec25eaa482fdd0859434c54305123edd7af8 [file] [log] [blame]
Linus Torvalds1da177e2005-04-16 15:20:36 -07001The Linux Watchdog driver API.
2
3Copyright 2002 Christer Weingel <wingel@nano-system.com>
4
5Some parts of this document are copied verbatim from the sbc60xxwdt
6driver which is (c) Copyright 2000 Jakob Oestergaard <jakob@ostenfeld.dk>
7
8This document describes the state of the Linux 2.4.18 kernel.
9
10Introduction:
11
12A Watchdog Timer (WDT) is a hardware circuit that can reset the
13computer system in case of a software fault. You probably knew that
14already.
15
16Usually a userspace daemon will notify the kernel watchdog driver via the
17/dev/watchdog special device file that userspace is still alive, at
18regular intervals. When such a notification occurs, the driver will
19usually tell the hardware watchdog that everything is in order, and
20that the watchdog should wait for yet another little while to reset
21the system. If userspace fails (RAM error, kernel bug, whatever), the
22notifications cease to occur, and the hardware watchdog will reset the
23system (causing a reboot) after the timeout occurs.
24
25The Linux watchdog API is a rather AD hoc construction and different
26drivers implement different, and sometimes incompatible, parts of it.
27This file is an attempt to document the existing usage and allow
28future driver writers to use it as a reference.
29
30The simplest API:
31
32All drivers support the basic mode of operation, where the watchdog
33activates as soon as /dev/watchdog is opened and will reboot unless
34the watchdog is pinged within a certain time, this time is called the
35timeout or margin. The simplest way to ping the watchdog is to write
36some data to the device. So a very simple watchdog daemon would look
37like this:
38
Randy Dunlap92930d92006-04-04 20:17:26 -070039#include <stdlib.h>
40#include <fcntl.h>
41
Linus Torvalds1da177e2005-04-16 15:20:36 -070042int main(int argc, const char *argv[]) {
43 int fd=open("/dev/watchdog",O_WRONLY);
44 if (fd==-1) {
45 perror("watchdog");
46 exit(1);
47 }
48 while(1) {
49 write(fd, "\0", 1);
50 sleep(10);
51 }
52}
53
54A more advanced driver could for example check that a HTTP server is
55still responding before doing the write call to ping the watchdog.
56
57When the device is closed, the watchdog is disabled. This is not
58always such a good idea, since if there is a bug in the watchdog
59daemon and it crashes the system will not reboot. Because of this,
60some of the drivers support the configuration option "Disable watchdog
61shutdown on close", CONFIG_WATCHDOG_NOWAYOUT. If it is set to Y when
62compiling the kernel, there is no way of disabling the watchdog once
63it has been started. So, if the watchdog dameon crashes, the system
64will reboot after the timeout has passed.
65
66Some other drivers will not disable the watchdog, unless a specific
67magic character 'V' has been sent /dev/watchdog just before closing
68the file. If the userspace daemon closes the file without sending
69this special character, the driver will assume that the daemon (and
70userspace in general) died, and will stop pinging the watchdog without
71disabling it first. This will then cause a reboot.
72
73The ioctl API:
74
75All conforming drivers also support an ioctl API.
76
77Pinging the watchdog using an ioctl:
78
79All drivers that have an ioctl interface support at least one ioctl,
80KEEPALIVE. This ioctl does exactly the same thing as a write to the
81watchdog device, so the main loop in the above program could be
82replaced with:
83
84 while (1) {
85 ioctl(fd, WDIOC_KEEPALIVE, 0);
86 sleep(10);
87 }
88
89the argument to the ioctl is ignored.
90
91Setting and getting the timeout:
92
93For some drivers it is possible to modify the watchdog timeout on the
94fly with the SETTIMEOUT ioctl, those drivers have the WDIOF_SETTIMEOUT
95flag set in their option field. The argument is an integer
96representing the timeout in seconds. The driver returns the real
97timeout used in the same variable, and this timeout might differ from
98the requested one due to limitation of the hardware.
99
100 int timeout = 45;
101 ioctl(fd, WDIOC_SETTIMEOUT, &timeout);
102 printf("The timeout was set to %d seconds\n", timeout);
103
104This example might actually print "The timeout was set to 60 seconds"
105if the device has a granularity of minutes for its timeout.
106
107Starting with the Linux 2.4.18 kernel, it is possible to query the
108current timeout using the GETTIMEOUT ioctl.
109
110 ioctl(fd, WDIOC_GETTIMEOUT, &timeout);
111 printf("The timeout was is %d seconds\n", timeout);
112
Corey Minyarde05b59f2006-04-19 22:40:53 +0200113Pretimeouts:
114
115Some watchdog timers can be set to have a trigger go off before the
116actual time they will reset the system. This can be done with an NMI,
117interrupt, or other mechanism. This allows Linux to record useful
118information (like panic information and kernel coredumps) before it
119resets.
120
121 pretimeout = 10;
122 ioctl(fd, WDIOC_SETPRETIMEOUT, &pretimeout);
123
124Note that the pretimeout is the number of seconds before the time
125when the timeout will go off. It is not the number of seconds until
126the pretimeout. So, for instance, if you set the timeout to 60 seconds
127and the pretimeout to 10 seconds, the pretimout will go of in 50
128seconds. Setting a pretimeout to zero disables it.
129
130There is also a get function for getting the pretimeout:
131
132 ioctl(fd, WDIOC_GETPRETIMEOUT, &timeout);
133 printf("The pretimeout was is %d seconds\n", timeout);
134
135Not all watchdog drivers will support a pretimeout.
136
Wim Van Sebroeck58b519f2006-05-21 12:48:44 +0200137Get the number of seconds before reboot:
138
139Some watchdog drivers have the ability to report the remaining time
140before the system will reboot. The WDIOC_GETTIMELEFT is the ioctl
141that returns the number of seconds before reboot.
142
143 ioctl(fd, WDIOC_GETTIMELEFT, &timeleft);
144 printf("The timeout was is %d seconds\n", timeleft);
145
Corey Minyarde05b59f2006-04-19 22:40:53 +0200146Environmental monitoring:
Linus Torvalds1da177e2005-04-16 15:20:36 -0700147
148All watchdog drivers are required return more information about the system,
149some do temperature, fan and power level monitoring, some can tell you
150the reason for the last reboot of the system. The GETSUPPORT ioctl is
151available to ask what the device can do:
152
153 struct watchdog_info ident;
154 ioctl(fd, WDIOC_GETSUPPORT, &ident);
155
156the fields returned in the ident struct are:
157
158 identity a string identifying the watchdog driver
159 firmware_version the firmware version of the card if available
160 options a flags describing what the device supports
161
162the options field can have the following bits set, and describes what
163kind of information that the GET_STATUS and GET_BOOT_STATUS ioctls can
164return. [FIXME -- Is this correct?]
165
166 WDIOF_OVERHEAT Reset due to CPU overheat
167
168The machine was last rebooted by the watchdog because the thermal limit was
169exceeded
170
171 WDIOF_FANFAULT Fan failed
172
173A system fan monitored by the watchdog card has failed
174
175 WDIOF_EXTERN1 External relay 1
176
177External monitoring relay/source 1 was triggered. Controllers intended for
178real world applications include external monitoring pins that will trigger
179a reset.
180
181 WDIOF_EXTERN2 External relay 2
182
183External monitoring relay/source 2 was triggered
184
185 WDIOF_POWERUNDER Power bad/power fault
186
187The machine is showing an undervoltage status
188
189 WDIOF_CARDRESET Card previously reset the CPU
190
191The last reboot was caused by the watchdog card
192
193 WDIOF_POWEROVER Power over voltage
194
195The machine is showing an overvoltage status. Note that if one level is
196under and one over both bits will be set - this may seem odd but makes
197sense.
198
199 WDIOF_KEEPALIVEPING Keep alive ping reply
200
201The watchdog saw a keepalive ping since it was last queried.
202
203 WDIOF_SETTIMEOUT Can set/get the timeout
204
Corey Minyarde05b59f2006-04-19 22:40:53 +0200205The watchdog can do pretimeouts.
206
207 WDIOF_PRETIMEOUT Pretimeout (in seconds), get/set
208
Linus Torvalds1da177e2005-04-16 15:20:36 -0700209
210For those drivers that return any bits set in the option field, the
211GETSTATUS and GETBOOTSTATUS ioctls can be used to ask for the current
212status, and the status at the last reboot, respectively.
213
214 int flags;
215 ioctl(fd, WDIOC_GETSTATUS, &flags);
216
217 or
218
219 ioctl(fd, WDIOC_GETBOOTSTATUS, &flags);
220
221Note that not all devices support these two calls, and some only
222support the GETBOOTSTATUS call.
223
224Some drivers can measure the temperature using the GETTEMP ioctl. The
225returned value is the temperature in degrees farenheit.
226
227 int temperature;
228 ioctl(fd, WDIOC_GETTEMP, &temperature);
229
230Finally the SETOPTIONS ioctl can be used to control some aspects of
231the cards operation; right now the pcwd driver is the only one
232supporting thiss ioctl.
233
234 int options = 0;
235 ioctl(fd, WDIOC_SETOPTIONS, options);
236
237The following options are available:
238
239 WDIOS_DISABLECARD Turn off the watchdog timer
240 WDIOS_ENABLECARD Turn on the watchdog timer
241 WDIOS_TEMPPANIC Kernel panic on temperature trip
242
243[FIXME -- better explanations]
244
245Implementations in the current drivers in the kernel tree:
246
247Here I have tried to summarize what the different drivers support and
248where they do strange things compared to the other drivers.
249
250acquirewdt.c -- Acquire Single Board Computer
251
252 This driver has a hardcoded timeout of 1 minute
253
254 Supports CONFIG_WATCHDOG_NOWAYOUT
255
256 GETSUPPORT returns KEEPALIVEPING. GETSTATUS will return 1 if
257 the device is open, 0 if not. [FIXME -- isn't this rather
258 silly? To be able to use the ioctl, the device must be open
259 and so GETSTATUS will always return 1].
260
261advantechwdt.c -- Advantech Single Board Computer
262
263 Timeout that defaults to 60 seconds, supports SETTIMEOUT.
264
265 Supports CONFIG_WATCHDOG_NOWAYOUT
266
267 GETSUPPORT returns WDIOF_KEEPALIVEPING and WDIOF_SETTIMEOUT.
268 The GETSTATUS call returns if the device is open or not.
269 [FIXME -- silliness again?]
270
Kumar Galaa2f40cc2005-09-03 15:55:33 -0700271booke_wdt.c -- PowerPC BookE Watchdog Timer
272
273 Timeout default varies according to frequency, supports
274 SETTIMEOUT
275
276 Watchdog can not be turned off, CONFIG_WATCHDOG_NOWAYOUT
277 does not make sense
278
279 GETSUPPORT returns the watchdog_info struct, and
280 GETSTATUS returns the supported options. GETBOOTSTATUS
281 returns a 1 if the last reset was caused by the
282 watchdog and a 0 otherwise. This watchdog can not be
283 disabled once it has been started. The wdt_period kernel
284 parameter selects which bit of the time base changing
285 from 0->1 will trigger the watchdog exception. Changing
286 the timeout from the ioctl calls will change the
287 wdt_period as defined above. Finally if you would like to
288 replace the default Watchdog Handler you can implement the
289 WatchdogHandler() function in your own code.
290
Linus Torvalds1da177e2005-04-16 15:20:36 -0700291eurotechwdt.c -- Eurotech CPU-1220/1410
292
293 The timeout can be set using the SETTIMEOUT ioctl and defaults
294 to 60 seconds.
295
296 Also has a module parameter "ev", event type which controls
297 what should happen on a timeout, the string "int" or anything
298 else that causes a reboot. [FIXME -- better description]
299
300 Supports CONFIG_WATCHDOG_NOWAYOUT
301
302 GETSUPPORT returns CARDRESET and WDIOF_SETTIMEOUT but
303 GETSTATUS is not supported and GETBOOTSTATUS just returns 0.
304
305i810-tco.c -- Intel 810 chipset
306
307 Also has support for a lot of other i8x0 stuff, but the
308 watchdog is one of the things.
309
310 The timeout is set using the module parameter "i810_margin",
311 which is in steps of 0.6 seconds where 2<i810_margin<64. The
312 driver supports the SETTIMEOUT ioctl.
313
314 Supports CONFIG_WATCHDOG_NOWAYOUT.
315
316 GETSUPPORT returns WDIOF_SETTIMEOUT. The GETSTATUS call
317 returns some kind of timer value which ist not compatible with
318 the other drivers. GETBOOT status returns some kind of
319 hardware specific boot status. [FIXME -- describe this]
320
321ib700wdt.c -- IB700 Single Board Computer
322
323 Default timeout of 30 seconds and the timeout is settable
324 using the SETTIMEOUT ioctl. Note that only a few timeout
325 values are supported.
326
327 Supports CONFIG_WATCHDOG_NOWAYOUT
328
329 GETSUPPORT returns WDIOF_KEEPALIVEPING and WDIOF_SETTIMEOUT.
330 The GETSTATUS call returns if the device is open or not.
331 [FIXME -- silliness again?]
332
333machzwd.c -- MachZ ZF-Logic
334
335 Hardcoded timeout of 10 seconds
336
337 Has a module parameter "action" that controls what happens
338 when the timeout runs out which can be 0 = RESET (default),
339 1 = SMI, 2 = NMI, 3 = SCI.
340
341 Supports CONFIG_WATCHDOG_NOWAYOUT and the magic character
342 'V' close handling.
343
344 GETSUPPORT returns WDIOF_KEEPALIVEPING, and the GETSTATUS call
345 returns if the device is open or not. [FIXME -- silliness
346 again?]
347
348mixcomwd.c -- MixCom Watchdog
349
350 [FIXME -- I'm unable to tell what the timeout is]
351
352 Supports CONFIG_WATCHDOG_NOWAYOUT
353
354 GETSUPPORT returns WDIOF_KEEPALIVEPING, GETSTATUS returns if
355 the device is opened or not [FIXME -- I'm not really sure how
356 this works, there seems to be some magic connected to
357 CONFIG_WATCHDOG_NOWAYOUT]
358
359pcwd.c -- Berkshire PC Watchdog
360
361 Hardcoded timeout of 1.5 seconds
362
363 Supports CONFIG_WATCHDOG_NOWAYOUT
364
365 GETSUPPORT returns WDIOF_OVERHEAT|WDIOF_CARDRESET and both
366 GETSTATUS and GETBOOTSTATUS return something useful.
367
368 The SETOPTIONS call can be used to enable and disable the card
369 and to ask the driver to call panic if the system overheats.
370
371sbc60xxwdt.c -- 60xx Single Board Computer
372
373 Hardcoded timeout of 10 seconds
374
375 Does not support CONFIG_WATCHDOG_NOWAYOUT, but has the magic
376 character 'V' close handling.
377
378 No bits set in GETSUPPORT
379
380scx200.c -- National SCx200 CPUs
381
382 Not in the kernel yet.
383
384 The timeout is set using a module parameter "margin" which
385 defaults to 60 seconds. The timeout can also be set using
386 SETTIMEOUT and read using GETTIMEOUT.
387
388 Supports a module parameter "nowayout" that is initialized
389 with the value of CONFIG_WATCHDOG_NOWAYOUT. Also supports the
390 magic character 'V' handling.
391
392shwdt.c -- SuperH 3/4 processors
393
394 [FIXME -- I'm unable to tell what the timeout is]
395
396 Supports CONFIG_WATCHDOG_NOWAYOUT
397
398 GETSUPPORT returns WDIOF_KEEPALIVEPING, and the GETSTATUS call
399 returns if the device is open or not. [FIXME -- silliness
400 again?]
401
402softdog.c -- Software watchdog
403
404 The timeout is set with the module parameter "soft_margin"
405 which defaults to 60 seconds, the timeout is also settable
406 using the SETTIMEOUT ioctl.
407
408 Supports CONFIG_WATCHDOG_NOWAYOUT
409
410 WDIOF_SETTIMEOUT bit set in GETSUPPORT
411
412w83877f_wdt.c -- W83877F Computer
413
414 Hardcoded timeout of 30 seconds
415
416 Does not support CONFIG_WATCHDOG_NOWAYOUT, but has the magic
417 character 'V' close handling.
418
419 No bits set in GETSUPPORT
420
421w83627hf_wdt.c -- w83627hf watchdog
422
423 Timeout that defaults to 60 seconds, supports SETTIMEOUT.
424
425 Supports CONFIG_WATCHDOG_NOWAYOUT
426
427 GETSUPPORT returns WDIOF_KEEPALIVEPING and WDIOF_SETTIMEOUT.
428 The GETSTATUS call returns if the device is open or not.
429
430wdt.c -- ICS WDT500/501 ISA and
431wdt_pci.c -- ICS WDT500/501 PCI
432
433 Default timeout of 60 seconds. The timeout is also settable
434 using the SETTIMEOUT ioctl.
435
436 Supports CONFIG_WATCHDOG_NOWAYOUT
437
438 GETSUPPORT returns with bits set depending on the actual
439 card. The WDT501 supports a lot of external monitoring, the
440 WDT500 much less.
441
442wdt285.c -- Footbridge watchdog
443
444 The timeout is set with the module parameter "soft_margin"
445 which defaults to 60 seconds. The timeout is also settable
446 using the SETTIMEOUT ioctl.
447
448 Does not support CONFIG_WATCHDOG_NOWAYOUT
449
450 WDIOF_SETTIMEOUT bit set in GETSUPPORT
451
452wdt977.c -- Netwinder W83977AF chip
453
454 Hardcoded timeout of 3 minutes
455
456 Supports CONFIG_WATCHDOG_NOWAYOUT
457
458 Does not support any ioctls at all.
459