x86: fix synchronize_rcu(): high latency on idle system

an otherwise idle system takes about 3 ticks per network
interface in unregister_netdev() due to multiple calls to synchronize_rcu(),
which adds up to quite a few seconds for tearing down thousands of
interfaces.  By flushing pending rcu callbacks in the idle loop, the system
makes progress hundreds of times faster.  If this is indeed a sane thing to,
it probably needs to be done for other architectures than x86.  And yes, the
network stack shouldn't call synchronize_rcu() quite so much, but fixing that
is a little more involved.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 7a61b54..69a69c3 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -198,6 +198,9 @@
 			rmb();
 			idle = pm_idle;
 
+			if (rcu_pending(cpu))
+				rcu_check_callbacks(cpu, 0);
+
 			if (!idle)
 				idle = default_idle;