| Paul E. McKenney | 4c54005 | 2010-01-14 16:10:57 -0800 | [diff] [blame] | 1 | Using RCU's CPU Stall Detector | 
 | 2 |  | 
| Paul E. McKenney | a00e0d71 | 2011-02-08 17:14:39 -0800 | [diff] [blame] | 3 | The rcu_cpu_stall_suppress module parameter enables RCU's CPU stall | 
 | 4 | detector, which detects conditions that unduly delay RCU grace periods. | 
 | 5 | This module parameter enables CPU stall detection by default, but | 
 | 6 | may be overridden via boot-time parameter or at runtime via sysfs. | 
 | 7 | The stall detector's idea of what constitutes "unduly delayed" is | 
 | 8 | controlled by a set of kernel configuration variables and cpp macros: | 
| Paul E. McKenney | 4c54005 | 2010-01-14 16:10:57 -0800 | [diff] [blame] | 9 |  | 
| Paul E. McKenney | a00e0d71 | 2011-02-08 17:14:39 -0800 | [diff] [blame] | 10 | CONFIG_RCU_CPU_STALL_TIMEOUT | 
| Paul E. McKenney | 4c54005 | 2010-01-14 16:10:57 -0800 | [diff] [blame] | 11 |  | 
| Paul E. McKenney | a00e0d71 | 2011-02-08 17:14:39 -0800 | [diff] [blame] | 12 | 	This kernel configuration parameter defines the period of time | 
 | 13 | 	that RCU will wait from the beginning of a grace period until it | 
 | 14 | 	issues an RCU CPU stall warning.  This time period is normally | 
| Paul E. McKenney | 24cd7fd | 2012-01-20 17:35:55 -0800 | [diff] [blame] | 15 | 	sixty seconds. | 
| Paul E. McKenney | 4c54005 | 2010-01-14 16:10:57 -0800 | [diff] [blame] | 16 |  | 
| Paul E. McKenney | 24cd7fd | 2012-01-20 17:35:55 -0800 | [diff] [blame] | 17 | 	This configuration parameter may be changed at runtime via the | 
 | 18 | 	/sys/module/rcutree/parameters/rcu_cpu_stall_timeout, however | 
 | 19 | 	this parameter is checked only at the beginning of a cycle. | 
 | 20 | 	So if you are 30 seconds into a 70-second stall, setting this | 
 | 21 | 	sysfs parameter to (say) five will shorten the timeout for the | 
 | 22 | 	-next- stall, or the following warning for the current stall | 
 | 23 | 	(assuming the stall lasts long enough).  It will not affect the | 
 | 24 | 	timing of the next warning for the current stall. | 
| Paul E. McKenney | 4c54005 | 2010-01-14 16:10:57 -0800 | [diff] [blame] | 25 |  | 
| Paul E. McKenney | 24cd7fd | 2012-01-20 17:35:55 -0800 | [diff] [blame] | 26 | 	Stall-warning messages may be enabled and disabled completely via | 
 | 27 | 	/sys/module/rcutree/parameters/rcu_cpu_stall_suppress. | 
 | 28 |  | 
 | 29 | CONFIG_RCU_CPU_STALL_VERBOSE | 
 | 30 |  | 
 | 31 | 	This kernel configuration parameter causes the stall warning to | 
 | 32 | 	also dump the stacks of any tasks that are blocking the current | 
 | 33 | 	RCU-preempt grace period. | 
 | 34 |  | 
 | 35 | RCU_CPU_STALL_INFO | 
 | 36 |  | 
 | 37 | 	This kernel configuration parameter causes the stall warning to | 
 | 38 | 	print out additional per-CPU diagnostic information, including | 
 | 39 | 	information on scheduling-clock ticks and RCU's idle-CPU tracking. | 
 | 40 |  | 
 | 41 | RCU_STALL_DELAY_DELTA | 
 | 42 |  | 
 | 43 | 	Although the lockdep facility is extremely useful, it does add | 
 | 44 | 	some overhead.  Therefore, under CONFIG_PROVE_RCU, the | 
 | 45 | 	RCU_STALL_DELAY_DELTA macro allows five extra seconds before | 
 | 46 | 	giving an RCU CPU stall warning message. | 
| Paul E. McKenney | 4c54005 | 2010-01-14 16:10:57 -0800 | [diff] [blame] | 47 |  | 
 | 48 | RCU_STALL_RAT_DELAY | 
 | 49 |  | 
| Paul E. McKenney | f1d507b | 2010-04-15 15:49:46 -0700 | [diff] [blame] | 50 | 	The CPU stall detector tries to make the offending CPU print its | 
 | 51 | 	own warnings, as this often gives better-quality stack traces. | 
 | 52 | 	However, if the offending CPU does not detect its own stall in | 
 | 53 | 	the number of jiffies specified by RCU_STALL_RAT_DELAY, then | 
 | 54 | 	some other CPU will complain.  This delay is normally set to | 
 | 55 | 	two jiffies. | 
| Paul E. McKenney | 4c54005 | 2010-01-14 16:10:57 -0800 | [diff] [blame] | 56 |  | 
| Paul E. McKenney | f1d507b | 2010-04-15 15:49:46 -0700 | [diff] [blame] | 57 | When a CPU detects that it is stalling, it will print a message similar | 
 | 58 | to the following: | 
 | 59 |  | 
 | 60 | INFO: rcu_sched_state detected stall on CPU 5 (t=2500 jiffies) | 
 | 61 |  | 
 | 62 | This message indicates that CPU 5 detected that it was causing a stall, | 
 | 63 | and that the stall was affecting RCU-sched.  This message will normally be | 
 | 64 | followed by a stack dump of the offending CPU.  On TREE_RCU kernel builds, | 
 | 65 | RCU and RCU-sched are implemented by the same underlying mechanism, | 
 | 66 | while on TREE_PREEMPT_RCU kernel builds, RCU is instead implemented | 
 | 67 | by rcu_preempt_state. | 
 | 68 |  | 
 | 69 | On the other hand, if the offending CPU fails to print out a stall-warning | 
 | 70 | message quickly enough, some other CPU will print a message similar to | 
 | 71 | the following: | 
 | 72 |  | 
 | 73 | INFO: rcu_bh_state detected stalls on CPUs/tasks: { 3 5 } (detected by 2, 2502 jiffies) | 
 | 74 |  | 
 | 75 | This message indicates that CPU 2 detected that CPUs 3 and 5 were both | 
 | 76 | causing stalls, and that the stall was affecting RCU-bh.  This message | 
 | 77 | will normally be followed by stack dumps for each CPU.  Please note that | 
 | 78 | TREE_PREEMPT_RCU builds can be stalled by tasks as well as by CPUs, | 
 | 79 | and that the tasks will be indicated by PID, for example, "P3421". | 
 | 80 | It is even possible for a rcu_preempt_state stall to be caused by both | 
 | 81 | CPUs -and- tasks, in which case the offending CPUs and tasks will all | 
 | 82 | be called out in the list. | 
 | 83 |  | 
 | 84 | Finally, if the grace period ends just as the stall warning starts | 
 | 85 | printing, there will be a spurious stall-warning message: | 
 | 86 |  | 
 | 87 | INFO: rcu_bh_state detected stalls on CPUs/tasks: { } (detected by 4, 2502 jiffies) | 
 | 88 |  | 
 | 89 | This is rare, but does happen from time to time in real life. | 
 | 90 |  | 
| Paul E. McKenney | 24cd7fd | 2012-01-20 17:35:55 -0800 | [diff] [blame] | 91 | If the CONFIG_RCU_CPU_STALL_INFO kernel configuration parameter is set, | 
 | 92 | more information is printed with the stall-warning message, for example: | 
 | 93 |  | 
 | 94 | 	INFO: rcu_preempt detected stall on CPU | 
| Paul E. McKenney | 6231069 | 2013-03-06 13:37:09 -0800 | [diff] [blame] | 95 | 	0: (63959 ticks this GP) idle=241/3fffffffffffffff/0 softirq=82/543 | 
| Paul E. McKenney | 24cd7fd | 2012-01-20 17:35:55 -0800 | [diff] [blame] | 96 | 	   (t=65000 jiffies) | 
 | 97 |  | 
 | 98 | In kernels with CONFIG_RCU_FAST_NO_HZ, even more information is | 
 | 99 | printed: | 
 | 100 |  | 
 | 101 | 	INFO: rcu_preempt detected stall on CPU | 
| Paul E. McKenney | 6231069 | 2013-03-06 13:37:09 -0800 | [diff] [blame] | 102 | 	0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 nonlazy_posted: 25 .D | 
| Paul E. McKenney | 24cd7fd | 2012-01-20 17:35:55 -0800 | [diff] [blame] | 103 | 	   (t=65000 jiffies) | 
 | 104 |  | 
 | 105 | The "(64628 ticks this GP)" indicates that this CPU has taken more | 
 | 106 | than 64,000 scheduling-clock interrupts during the current stalled | 
 | 107 | grace period.  If the CPU was not yet aware of the current grace | 
 | 108 | period (for example, if it was offline), then this part of the message | 
 | 109 | indicates how many grace periods behind the CPU is. | 
 | 110 |  | 
 | 111 | The "idle=" portion of the message prints the dyntick-idle state. | 
 | 112 | The hex number before the first "/" is the low-order 12 bits of the | 
 | 113 | dynticks counter, which will have an even-numbered value if the CPU is | 
 | 114 | in dyntick-idle mode and an odd-numbered value otherwise.  The hex | 
 | 115 | number between the two "/"s is the value of the nesting, which will | 
 | 116 | be a small positive number if in the idle loop and a very large positive | 
 | 117 | number (as shown above) otherwise. | 
 | 118 |  | 
| Paul E. McKenney | 6231069 | 2013-03-06 13:37:09 -0800 | [diff] [blame] | 119 | The "softirq=" portion of the message tracks the number of RCU softirq | 
 | 120 | handlers that the stalled CPU has executed.  The number before the "/" | 
 | 121 | is the number that had executed since boot at the time that this CPU | 
 | 122 | last noted the beginning of a grace period, which might be the current | 
 | 123 | (stalled) grace period, or it might be some earlier grace period (for | 
 | 124 | example, if the CPU might have been in dyntick-idle mode for an extended | 
 | 125 | time period.  The number after the "/" is the number that have executed | 
 | 126 | since boot until the current time.  If this latter number stays constant | 
 | 127 | across repeated stall-warning messages, it is possible that RCU's softirq | 
 | 128 | handlers are no longer able to execute on this CPU.  This can happen if | 
 | 129 | the stalled CPU is spinning with interrupts are disabled, or, in -rt | 
 | 130 | kernels, if a high-priority process is starving RCU's softirq handler. | 
 | 131 |  | 
 | 132 | For CONFIG_RCU_FAST_NO_HZ kernels, the "last_accelerate:" prints the | 
 | 133 | low-order 16 bits (in hex) of the jiffies counter when this CPU last | 
 | 134 | invoked rcu_try_advance_all_cbs() from rcu_needs_cpu() or last invoked | 
 | 135 | rcu_accelerate_cbs() from rcu_prepare_for_idle().  The "nonlazy_posted:" | 
 | 136 | prints the number of non-lazy callbacks posted since the last call to | 
 | 137 | rcu_needs_cpu().  Finally, an "L" indicates that there are currently | 
 | 138 | no non-lazy callbacks ("." is printed otherwise, as shown above) and | 
 | 139 | "D" indicates that dyntick-idle processing is enabled ("." is printed | 
 | 140 | otherwise, for example, if disabled via the "nohz=" kernel boot parameter). | 
| Paul E. McKenney | 24cd7fd | 2012-01-20 17:35:55 -0800 | [diff] [blame] | 141 |  | 
 | 142 |  | 
 | 143 | Multiple Warnings From One Stall | 
 | 144 |  | 
 | 145 | If a stall lasts long enough, multiple stall-warning messages will be | 
 | 146 | printed for it.  The second and subsequent messages are printed at | 
 | 147 | longer intervals, so that the time between (say) the first and second | 
 | 148 | message will be about three times the interval between the beginning | 
 | 149 | of the stall and the first message. | 
 | 150 |  | 
 | 151 |  | 
 | 152 | What Causes RCU CPU Stall Warnings? | 
 | 153 |  | 
| Paul E. McKenney | f1d507b | 2010-04-15 15:49:46 -0700 | [diff] [blame] | 154 | So your kernel printed an RCU CPU stall warning.  The next question is | 
 | 155 | "What caused it?"  The following problems can result in RCU CPU stall | 
 | 156 | warnings: | 
| Paul E. McKenney | 4c54005 | 2010-01-14 16:10:57 -0800 | [diff] [blame] | 157 |  | 
 | 158 | o	A CPU looping in an RCU read-side critical section. | 
 | 159 | 	 | 
| Paul E. McKenney | f1d507b | 2010-04-15 15:49:46 -0700 | [diff] [blame] | 160 | o	A CPU looping with interrupts disabled.  This condition can | 
 | 161 | 	result in RCU-sched and RCU-bh stalls. | 
| Paul E. McKenney | 4c54005 | 2010-01-14 16:10:57 -0800 | [diff] [blame] | 162 |  | 
| Paul E. McKenney | f1d507b | 2010-04-15 15:49:46 -0700 | [diff] [blame] | 163 | o	A CPU looping with preemption disabled.  This condition can | 
 | 164 | 	result in RCU-sched stalls and, if ksoftirqd is in use, RCU-bh | 
 | 165 | 	stalls. | 
 | 166 |  | 
 | 167 | o	A CPU looping with bottom halves disabled.  This condition can | 
 | 168 | 	result in RCU-sched and RCU-bh stalls. | 
| Paul E. McKenney | 4c54005 | 2010-01-14 16:10:57 -0800 | [diff] [blame] | 169 |  | 
 | 170 | o	For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel | 
 | 171 | 	without invoking schedule(). | 
 | 172 |  | 
| Paul E. McKenney | 2c96c77 | 2010-08-23 16:34:02 -0700 | [diff] [blame] | 173 | o	A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might | 
 | 174 | 	happen to preempt a low-priority task in the middle of an RCU | 
 | 175 | 	read-side critical section.   This is especially damaging if | 
 | 176 | 	that low-priority task is not permitted to run on any other CPU, | 
 | 177 | 	in which case the next RCU grace period can never complete, which | 
 | 178 | 	will eventually cause the system to run out of memory and hang. | 
 | 179 | 	While the system is in the process of running itself out of | 
 | 180 | 	memory, you might see stall-warning messages. | 
 | 181 |  | 
 | 182 | o	A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that | 
 | 183 | 	is running at a higher priority than the RCU softirq threads. | 
 | 184 | 	This will prevent RCU callbacks from ever being invoked, | 
 | 185 | 	and in a CONFIG_TREE_PREEMPT_RCU kernel will further prevent | 
 | 186 | 	RCU grace periods from ever completing.  Either way, the | 
 | 187 | 	system will eventually run out of memory and hang.  In the | 
 | 188 | 	CONFIG_TREE_PREEMPT_RCU case, you might see stall-warning | 
 | 189 | 	messages. | 
 | 190 |  | 
| Paul E. McKenney | 2c01531 | 2011-10-02 17:21:18 -0700 | [diff] [blame] | 191 | o	A hardware or software issue shuts off the scheduler-clock | 
 | 192 | 	interrupt on a CPU that is not in dyntick-idle mode.  This | 
 | 193 | 	problem really has happened, and seems to be most likely to | 
| Frederic Weisbecker | 3451d02 | 2011-08-10 23:21:01 +0200 | [diff] [blame] | 194 | 	result in RCU CPU stall warnings for CONFIG_NO_HZ_COMMON=n kernels. | 
| Paul E. McKenney | 2c01531 | 2011-10-02 17:21:18 -0700 | [diff] [blame] | 195 |  | 
| Paul E. McKenney | 4c54005 | 2010-01-14 16:10:57 -0800 | [diff] [blame] | 196 | o	A bug in the RCU implementation. | 
 | 197 |  | 
 | 198 | o	A hardware failure.  This is quite unlikely, but has occurred | 
| Paul E. McKenney | f1d507b | 2010-04-15 15:49:46 -0700 | [diff] [blame] | 199 | 	at least once in real life.  A CPU failed in a running system, | 
| Paul E. McKenney | 4c54005 | 2010-01-14 16:10:57 -0800 | [diff] [blame] | 200 | 	becoming unresponsive, but not causing an immediate crash. | 
 | 201 | 	This resulted in a series of RCU CPU stall warnings, eventually | 
 | 202 | 	leading the realization that the CPU had failed. | 
 | 203 |  | 
| Paul E. McKenney | 9ceae0e | 2011-11-03 13:43:24 -0700 | [diff] [blame] | 204 | The RCU, RCU-sched, and RCU-bh implementations have CPU stall warning. | 
 | 205 | SRCU does not have its own CPU stall warnings, but its calls to | 
 | 206 | synchronize_sched() will result in RCU-sched detecting RCU-sched-related | 
 | 207 | CPU stalls.  Please note that RCU only detects CPU stalls when there is | 
 | 208 | a grace period in progress.  No grace period, no CPU stall warnings. | 
| Paul E. McKenney | 4c54005 | 2010-01-14 16:10:57 -0800 | [diff] [blame] | 209 |  | 
| Paul E. McKenney | f1d507b | 2010-04-15 15:49:46 -0700 | [diff] [blame] | 210 | To diagnose the cause of the stall, inspect the stack traces. | 
 | 211 | The offending function will usually be near the top of the stack. | 
 | 212 | If you have a series of stall warnings from a single extended stall, | 
 | 213 | comparing the stack traces can often help determine where the stall | 
 | 214 | is occurring, which will usually be in the function nearest the top of | 
 | 215 | that portion of the stack which remains the same from trace to trace. | 
 | 216 | If you can reliably trigger the stall, ftrace can be quite helpful. | 
| Paul E. McKenney | 4c54005 | 2010-01-14 16:10:57 -0800 | [diff] [blame] | 217 |  | 
| Paul E. McKenney | 24cd7fd | 2012-01-20 17:35:55 -0800 | [diff] [blame] | 218 | RCU bugs can often be debugged with the help of CONFIG_RCU_TRACE | 
 | 219 | and with RCU's event tracing. |