[PATCH] check nmi watchdog is broken
A bug against an xSeries system showed up recently noting that the
check_nmi_watchdog() test was failing.
I have been investigating it and discovered in both i386 and x86_64 the
recent change to the routine to use the cpu_callin_map has uncovered a
problem. Prior to that change, on an SMP box, the test was trivally
passing because all cpu's were found to not yet be online, but now with the
callin_map they are discovered, it goes on to test the counter and they
have not yet begun to increment, so it announces a CPU is stuck and bails
out.
On all the systems I have access to test, the announcement of failure is
also bougs... by the time you can login and check /proc/interrupts, the
NMI count is happily incrementing on all CPUs. Its just that the test is
being done too early.
I have tried moving the call to the test around a bit, and it was always
too early. I finally hit on this proposed solution, it delays the routine
via a late_initcall(), seems like the right solution to me.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
diff --git a/arch/i386/kernel/nmi.c b/arch/i386/kernel/nmi.c
index 2f89d00..2c0ee9c 100644
--- a/arch/i386/kernel/nmi.c
+++ b/arch/i386/kernel/nmi.c
@@ -102,20 +102,21 @@
(P4_CCCR_OVF_PMI0|P4_CCCR_THRESHOLD(15)|P4_CCCR_COMPLEMENT| \
P4_CCCR_COMPARE|P4_CCCR_REQUIRED|P4_CCCR_ESCR_SELECT(4)|P4_CCCR_ENABLE)
-int __init check_nmi_watchdog (void)
+static int __init check_nmi_watchdog(void)
{
unsigned int prev_nmi_count[NR_CPUS];
int cpu;
- printk(KERN_INFO "testing NMI watchdog ... ");
+ if (nmi_watchdog == NMI_NONE)
+ return 0;
+
+ printk(KERN_INFO "Testing NMI watchdog ... ");
for (cpu = 0; cpu < NR_CPUS; cpu++)
prev_nmi_count[cpu] = per_cpu(irq_stat, cpu).__nmi_count;
local_irq_enable();
mdelay((10*1000)/nmi_hz); // wait 10 ticks
- /* FIXME: Only boot CPU is online at this stage. Check CPUs
- as they come up. */
for (cpu = 0; cpu < NR_CPUS; cpu++) {
#ifdef CONFIG_SMP
/* Check cpu_callin_map here because that is set
@@ -139,6 +140,8 @@
return 0;
}
+/* This needs to happen later in boot so counters are working */
+late_initcall(check_nmi_watchdog);
static int __init setup_nmi_watchdog(char *str)
{