| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1 |  | 
 | 2 | [NMI watchdog is available for x86 and x86-64 architectures] | 
 | 3 |  | 
 | 4 | Is your system locking up unpredictably? No keyboard activity, just | 
 | 5 | a frustrating complete hard lockup? Do you want to help us debugging | 
 | 6 | such lockups? If all yes then this document is definitely for you. | 
 | 7 |  | 
 | 8 | On many x86/x86-64 type hardware there is a feature that enables | 
 | 9 | us to generate 'watchdog NMI interrupts'.  (NMI: Non Maskable Interrupt | 
 | 10 | which get executed even if the system is otherwise locked up hard). | 
 | 11 | This can be used to debug hard kernel lockups.  By executing periodic | 
 | 12 | NMI interrupts, the kernel can monitor whether any CPU has locked up, | 
| Cyrill Gorcunov | afda335 | 2008-06-27 19:43:40 +0400 | [diff] [blame] | 13 | and print out debugging messages if so. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 14 |  | 
 | 15 | In order to use the NMI watchdog, you need to have APIC support in your | 
 | 16 | kernel. For SMP kernels, APIC support gets compiled in automatically. For | 
 | 17 | UP, enable either CONFIG_X86_UP_APIC (Processor type and features -> Local | 
 | 18 | APIC support on uniprocessors) or CONFIG_X86_UP_IOAPIC (Processor type and | 
 | 19 | features -> IO-APIC support on uniprocessors) in your kernel config. | 
 | 20 | CONFIG_X86_UP_APIC is for uniprocessor machines without an IO-APIC. | 
 | 21 | CONFIG_X86_UP_IOAPIC is for uniprocessor with an IO-APIC. [Note: certain | 
 | 22 | kernel debugging options, such as Kernel Stack Meter or Kernel Tracer, | 
 | 23 | may implicitly disable the NMI watchdog.] | 
 | 24 |  | 
| Cyrill Gorcunov | afda335 | 2008-06-27 19:43:40 +0400 | [diff] [blame] | 25 | For x86-64, the needed APIC is always compiled in. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 26 |  | 
 | 27 | Using local APIC (nmi_watchdog=2) needs the first performance register, so | 
 | 28 | you can't use it for other purposes (such as high precision performance | 
 | 29 | profiling.) However, at least oprofile and the perfctr driver disable the | 
 | 30 | local APIC NMI watchdog automatically. | 
 | 31 |  | 
 | 32 | To actually enable the NMI watchdog, use the 'nmi_watchdog=N' boot | 
 | 33 | parameter.  Eg. the relevant lilo.conf entry: | 
 | 34 |  | 
 | 35 |         append="nmi_watchdog=1" | 
 | 36 |  | 
 | 37 | For SMP machines and UP machines with an IO-APIC use nmi_watchdog=1. | 
 | 38 | For UP machines without an IO-APIC use nmi_watchdog=2, this only works | 
 | 39 | for some processor types.  If in doubt, boot with nmi_watchdog=1 and | 
 | 40 | check the NMI count in /proc/interrupts; if the count is zero then | 
 | 41 | reboot with nmi_watchdog=2 and check the NMI count.  If it is still | 
 | 42 | zero then log a problem, you probably have a processor that needs to be | 
 | 43 | added to the nmi code. | 
 | 44 |  | 
 | 45 | A 'lockup' is the following scenario: if any CPU in the system does not | 
 | 46 | execute the period local timer interrupt for more than 5 seconds, then | 
 | 47 | the NMI handler generates an oops and kills the process. This | 
 | 48 | 'controlled crash' (and the resulting kernel messages) can be used to | 
 | 49 | debug the lockup. Thus whenever the lockup happens, wait 5 seconds and | 
 | 50 | the oops will show up automatically. If the kernel produces no messages | 
 | 51 | then the system has crashed so hard (eg. hardware-wise) that either it | 
 | 52 | cannot even accept NMI interrupts, or the crash has made the kernel | 
 | 53 | unable to print messages. | 
 | 54 |  | 
 | 55 | Be aware that when using local APIC, the frequency of NMI interrupts | 
 | 56 | it generates, depends on the system load. The local APIC NMI watchdog, | 
 | 57 | lacking a better source, uses the "cycles unhalted" event. As you may | 
 | 58 | guess it doesn't tick when the CPU is in the halted state (which happens | 
 | 59 | when the system is idle), but if your system locks up on anything but the | 
 | 60 | "hlt" processor instruction, the watchdog will trigger very soon as the | 
 | 61 | "cycles unhalted" event will happen every clock tick. If it locks up on | 
 | 62 | "hlt", then you are out of luck -- the event will not happen at all and the | 
 | 63 | watchdog won't trigger. This is a shortcoming of the local APIC watchdog | 
 | 64 | -- unfortunately there is no "clock ticks" event that would work all the | 
| Cyrill Gorcunov | afda335 | 2008-06-27 19:43:40 +0400 | [diff] [blame] | 65 | time. The I/O APIC watchdog is driven externally and has no such shortcoming. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 66 | But its NMI frequency is much higher, resulting in a more significant hit | 
 | 67 | to the overall system performance. | 
 | 68 |  | 
| Cyrill Gorcunov | afda335 | 2008-06-27 19:43:40 +0400 | [diff] [blame] | 69 | On x86 nmi_watchdog is disabled by default so you have to enable it with | 
 | 70 | a boot time parameter. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 71 |  | 
| Aristeu Rozanski | 8a1c8eb | 2008-10-30 13:08:50 -0400 | [diff] [blame] | 72 | It's possible to disable the NMI watchdog in run-time by writing "0" to | 
 | 73 | /proc/sys/kernel/nmi_watchdog. Writing "1" to the same file will re-enable | 
 | 74 | the NMI watchdog. Notice that you still need to use "nmi_watchdog=" parameter | 
 | 75 | at boot time. | 
 | 76 |  | 
| Ingo Molnar | 1bb3a02 | 2008-06-30 08:47:42 +0200 | [diff] [blame] | 77 | NOTE: In kernels prior to 2.4.2-ac18 the NMI-oopser is enabled unconditionally | 
| Cyrill Gorcunov | afda335 | 2008-06-27 19:43:40 +0400 | [diff] [blame] | 78 | on x86 SMP boxes. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 79 |  | 
 | 80 | [ feel free to send bug reports, suggestions and patches to | 
 | 81 |   Ingo Molnar <mingo@redhat.com> or the Linux SMP mailing | 
 | 82 |   list at <linux-smp@vger.kernel.org> ] | 
 | 83 |  |