| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1 | RCU on Uniprocessor Systems | 
|  | 2 |  | 
|  | 3 |  | 
|  | 4 | A common misconception is that, on UP systems, the call_rcu() primitive | 
| Paul E. McKenney | 240ebbf | 2009-06-25 09:08:18 -0700 | [diff] [blame] | 5 | may immediately invoke its function.  The basis of this misconception | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 6 | is that since there is only one CPU, it should not be necessary to | 
|  | 7 | wait for anything else to get done, since there are no other CPUs for | 
| Paul E. McKenney | a83f1fe | 2005-05-01 08:59:05 -0700 | [diff] [blame] | 8 | anything else to be happening on.  Although this approach will -sort- -of- | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 9 | work a surprising amount of the time, it is a very bad idea in general. | 
| Paul E. McKenney | 240ebbf | 2009-06-25 09:08:18 -0700 | [diff] [blame] | 10 | This document presents three examples that demonstrate exactly how bad | 
|  | 11 | an idea this is. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 12 |  | 
|  | 13 |  | 
|  | 14 | Example 1: softirq Suicide | 
|  | 15 |  | 
|  | 16 | Suppose that an RCU-based algorithm scans a linked list containing | 
|  | 17 | elements A, B, and C in process context, and can delete elements from | 
|  | 18 | this same list in softirq context.  Suppose that the process-context scan | 
|  | 19 | is referencing element B when it is interrupted by softirq processing, | 
|  | 20 | which deletes element B, and then invokes call_rcu() to free element B | 
|  | 21 | after a grace period. | 
|  | 22 |  | 
|  | 23 | Now, if call_rcu() were to directly invoke its arguments, then upon return | 
|  | 24 | from softirq, the list scan would find itself referencing a newly freed | 
|  | 25 | element B.  This situation can greatly decrease the life expectancy of | 
|  | 26 | your kernel. | 
|  | 27 |  | 
| Paul E. McKenney | dd81eca | 2005-09-10 00:26:24 -0700 | [diff] [blame] | 28 | This same problem can occur if call_rcu() is invoked from a hardware | 
|  | 29 | interrupt handler. | 
|  | 30 |  | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 31 |  | 
|  | 32 | Example 2: Function-Call Fatality | 
|  | 33 |  | 
|  | 34 | Of course, one could avert the suicide described in the preceding example | 
|  | 35 | by having call_rcu() directly invoke its arguments only if it was called | 
|  | 36 | from process context.  However, this can fail in a similar manner. | 
|  | 37 |  | 
|  | 38 | Suppose that an RCU-based algorithm again scans a linked list containing | 
|  | 39 | elements A, B, and C in process contexts, but that it invokes a function | 
|  | 40 | on each element as it is scanned.  Suppose further that this function | 
|  | 41 | deletes element B from the list, then passes it to call_rcu() for deferred | 
|  | 42 | freeing.  This may be a bit unconventional, but it is perfectly legal | 
|  | 43 | RCU usage, since call_rcu() must wait for a grace period to elapse. | 
|  | 44 | Therefore, in this case, allowing call_rcu() to immediately invoke | 
|  | 45 | its arguments would cause it to fail to make the fundamental guarantee | 
|  | 46 | underlying RCU, namely that call_rcu() defers invoking its arguments until | 
|  | 47 | all RCU read-side critical sections currently executing have completed. | 
|  | 48 |  | 
| Paul E. McKenney | dd81eca | 2005-09-10 00:26:24 -0700 | [diff] [blame] | 49 | Quick Quiz #1: why is it -not- legal to invoke synchronize_rcu() in | 
|  | 50 | this case? | 
|  | 51 |  | 
|  | 52 |  | 
|  | 53 | Example 3: Death by Deadlock | 
|  | 54 |  | 
|  | 55 | Suppose that call_rcu() is invoked while holding a lock, and that the | 
|  | 56 | callback function must acquire this same lock.  In this case, if | 
|  | 57 | call_rcu() were to directly invoke the callback, the result would | 
|  | 58 | be self-deadlock. | 
|  | 59 |  | 
|  | 60 | In some cases, it would possible to restructure to code so that | 
|  | 61 | the call_rcu() is delayed until after the lock is released.  However, | 
|  | 62 | there are cases where this can be quite ugly: | 
|  | 63 |  | 
|  | 64 | 1.	If a number of items need to be passed to call_rcu() within | 
|  | 65 | the same critical section, then the code would need to create | 
|  | 66 | a list of them, then traverse the list once the lock was | 
|  | 67 | released. | 
|  | 68 |  | 
|  | 69 | 2.	In some cases, the lock will be held across some kernel API, | 
|  | 70 | so that delaying the call_rcu() until the lock is released | 
|  | 71 | requires that the data item be passed up via a common API. | 
|  | 72 | It is far better to guarantee that callbacks are invoked | 
|  | 73 | with no locks held than to have to modify such APIs to allow | 
|  | 74 | arbitrary data items to be passed back up through them. | 
|  | 75 |  | 
|  | 76 | If call_rcu() directly invokes the callback, painful locking restrictions | 
|  | 77 | or API changes would be required. | 
|  | 78 |  | 
|  | 79 | Quick Quiz #2: What locking restriction must RCU callbacks respect? | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 80 |  | 
|  | 81 |  | 
|  | 82 | Summary | 
|  | 83 |  | 
| Paul E. McKenney | 240ebbf | 2009-06-25 09:08:18 -0700 | [diff] [blame] | 84 | Permitting call_rcu() to immediately invoke its arguments breaks RCU, | 
|  | 85 | even on a UP system.  So do not do it!  Even on a UP system, the RCU | 
|  | 86 | infrastructure -must- respect grace periods, and -must- invoke callbacks | 
|  | 87 | from a known environment in which no locks are held. | 
|  | 88 |  | 
|  | 89 | It -is- safe for synchronize_sched() and synchronize_rcu_bh() to return | 
|  | 90 | immediately on an UP system.  It is also safe for synchronize_rcu() | 
|  | 91 | to return immediately on UP systems, except when running preemptable | 
|  | 92 | RCU. | 
|  | 93 |  | 
|  | 94 | Quick Quiz #3: Why can't synchronize_rcu() return immediately on | 
|  | 95 | UP systems running preemptable RCU? | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 96 |  | 
|  | 97 |  | 
| Paul E. McKenney | dd81eca | 2005-09-10 00:26:24 -0700 | [diff] [blame] | 98 | Answer to Quick Quiz #1: | 
|  | 99 | Why is it -not- legal to invoke synchronize_rcu() in this case? | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 100 |  | 
| Paul E. McKenney | dd81eca | 2005-09-10 00:26:24 -0700 | [diff] [blame] | 101 | Because the calling function is scanning an RCU-protected linked | 
|  | 102 | list, and is therefore within an RCU read-side critical section. | 
|  | 103 | Therefore, the called function has been invoked within an RCU | 
|  | 104 | read-side critical section, and is not permitted to block. | 
|  | 105 |  | 
|  | 106 | Answer to Quick Quiz #2: | 
|  | 107 | What locking restriction must RCU callbacks respect? | 
|  | 108 |  | 
|  | 109 | Any lock that is acquired within an RCU callback must be | 
|  | 110 | acquired elsewhere using an _irq variant of the spinlock | 
|  | 111 | primitive.  For example, if "mylock" is acquired by an | 
|  | 112 | RCU callback, then a process-context acquisition of this | 
|  | 113 | lock must use something like spin_lock_irqsave() to | 
|  | 114 | acquire the lock. | 
|  | 115 |  | 
|  | 116 | If the process-context code were to simply use spin_lock(), | 
|  | 117 | then, since RCU callbacks can be invoked from softirq context, | 
|  | 118 | the callback might be called from a softirq that interrupted | 
|  | 119 | the process-context critical section.  This would result in | 
|  | 120 | self-deadlock. | 
|  | 121 |  | 
|  | 122 | This restriction might seem gratuitous, since very few RCU | 
|  | 123 | callbacks acquire locks directly.  However, a great many RCU | 
|  | 124 | callbacks do acquire locks -indirectly-, for example, via | 
|  | 125 | the kfree() primitive. | 
| Paul E. McKenney | 240ebbf | 2009-06-25 09:08:18 -0700 | [diff] [blame] | 126 |  | 
|  | 127 | Answer to Quick Quiz #3: | 
|  | 128 | Why can't synchronize_rcu() return immediately on UP systems | 
|  | 129 | running preemptable RCU? | 
|  | 130 |  | 
|  | 131 | Because some other task might have been preempted in the middle | 
|  | 132 | of an RCU read-side critical section.  If synchronize_rcu() | 
|  | 133 | simply immediately returned, it would prematurely signal the | 
|  | 134 | end of the grace period, which would come as a nasty shock to | 
|  | 135 | that other thread when it started running again. |