| Jonathan Corbet | 0faa454 | 2007-06-23 17:16:41 -0700 | [diff] [blame] | 1 | Why the "volatile" type class should not be used | 
 | 2 | ------------------------------------------------ | 
 | 3 |  | 
 | 4 | C programmers have often taken volatile to mean that the variable could be | 
 | 5 | changed outside of the current thread of execution; as a result, they are | 
 | 6 | sometimes tempted to use it in kernel code when shared data structures are | 
 | 7 | being used.  In other words, they have been known to treat volatile types | 
 | 8 | as a sort of easy atomic variable, which they are not.  The use of volatile in | 
 | 9 | kernel code is almost never correct; this document describes why. | 
 | 10 |  | 
 | 11 | The key point to understand with regard to volatile is that its purpose is | 
 | 12 | to suppress optimization, which is almost never what one really wants to | 
 | 13 | do.  In the kernel, one must protect shared data structures against | 
 | 14 | unwanted concurrent access, which is very much a different task.  The | 
 | 15 | process of protecting against unwanted concurrency will also avoid almost | 
 | 16 | all optimization-related problems in a more efficient way. | 
 | 17 |  | 
 | 18 | Like volatile, the kernel primitives which make concurrent access to data | 
 | 19 | safe (spinlocks, mutexes, memory barriers, etc.) are designed to prevent | 
 | 20 | unwanted optimization.  If they are being used properly, there will be no | 
 | 21 | need to use volatile as well.  If volatile is still necessary, there is | 
 | 22 | almost certainly a bug in the code somewhere.  In properly-written kernel | 
 | 23 | code, volatile can only serve to slow things down. | 
 | 24 |  | 
 | 25 | Consider a typical block of kernel code: | 
 | 26 |  | 
 | 27 |     spin_lock(&the_lock); | 
 | 28 |     do_something_on(&shared_data); | 
 | 29 |     do_something_else_with(&shared_data); | 
 | 30 |     spin_unlock(&the_lock); | 
 | 31 |  | 
 | 32 | If all the code follows the locking rules, the value of shared_data cannot | 
 | 33 | change unexpectedly while the_lock is held.  Any other code which might | 
 | 34 | want to play with that data will be waiting on the lock.  The spinlock | 
 | 35 | primitives act as memory barriers - they are explicitly written to do so - | 
 | 36 | meaning that data accesses will not be optimized across them.  So the | 
 | 37 | compiler might think it knows what will be in shared_data, but the | 
 | 38 | spin_lock() call, since it acts as a memory barrier, will force it to | 
 | 39 | forget anything it knows.  There will be no optimization problems with | 
 | 40 | accesses to that data. | 
 | 41 |  | 
 | 42 | If shared_data were declared volatile, the locking would still be | 
 | 43 | necessary.  But the compiler would also be prevented from optimizing access | 
 | 44 | to shared_data _within_ the critical section, when we know that nobody else | 
 | 45 | can be working with it.  While the lock is held, shared_data is not | 
 | 46 | volatile.  When dealing with shared data, proper locking makes volatile | 
 | 47 | unnecessary - and potentially harmful. | 
 | 48 |  | 
 | 49 | The volatile storage class was originally meant for memory-mapped I/O | 
 | 50 | registers.  Within the kernel, register accesses, too, should be protected | 
 | 51 | by locks, but one also does not want the compiler "optimizing" register | 
 | 52 | accesses within a critical section.  But, within the kernel, I/O memory | 
 | 53 | accesses are always done through accessor functions; accessing I/O memory | 
 | 54 | directly through pointers is frowned upon and does not work on all | 
 | 55 | architectures.  Those accessors are written to prevent unwanted | 
 | 56 | optimization, so, once again, volatile is unnecessary. | 
 | 57 |  | 
 | 58 | Another situation where one might be tempted to use volatile is | 
 | 59 | when the processor is busy-waiting on the value of a variable.  The right | 
 | 60 | way to perform a busy wait is: | 
 | 61 |  | 
 | 62 |     while (my_variable != what_i_want) | 
 | 63 |         cpu_relax(); | 
 | 64 |  | 
 | 65 | The cpu_relax() call can lower CPU power consumption or yield to a | 
 | 66 | hyperthreaded twin processor; it also happens to serve as a memory barrier, | 
 | 67 | so, once again, volatile is unnecessary.  Of course, busy-waiting is | 
 | 68 | generally an anti-social act to begin with. | 
 | 69 |  | 
 | 70 | There are still a few rare situations where volatile makes sense in the | 
 | 71 | kernel: | 
 | 72 |  | 
 | 73 |   - The above-mentioned accessor functions might use volatile on | 
 | 74 |     architectures where direct I/O memory access does work.  Essentially, | 
 | 75 |     each accessor call becomes a little critical section on its own and | 
 | 76 |     ensures that the access happens as expected by the programmer. | 
 | 77 |  | 
 | 78 |   - Inline assembly code which changes memory, but which has no other | 
 | 79 |     visible side effects, risks being deleted by GCC.  Adding the volatile | 
 | 80 |     keyword to asm statements will prevent this removal. | 
 | 81 |  | 
 | 82 |   - The jiffies variable is special in that it can have a different value | 
 | 83 |     every time it is referenced, but it can be read without any special | 
 | 84 |     locking.  So jiffies can be volatile, but the addition of other | 
 | 85 |     variables of this type is strongly frowned upon.  Jiffies is considered | 
 | 86 |     to be a "stupid legacy" issue (Linus's words) in this regard; fixing it | 
 | 87 |     would be more trouble than it is worth. | 
 | 88 |  | 
 | 89 |   - Pointers to data structures in coherent memory which might be modified | 
 | 90 |     by I/O devices can, sometimes, legitimately be volatile.  A ring buffer | 
 | 91 |     used by a network adapter, where that adapter changes pointers to | 
 | 92 |     indicate which descriptors have been processed, is an example of this | 
 | 93 |     type of situation. | 
 | 94 |  | 
 | 95 | For most code, none of the above justifications for volatile apply.  As a | 
 | 96 | result, the use of volatile is likely to be seen as a bug and will bring | 
 | 97 | additional scrutiny to the code.  Developers who are tempted to use | 
 | 98 | volatile should take a step back and think about what they are truly trying | 
 | 99 | to accomplish. | 
 | 100 |  | 
 | 101 | Patches to remove volatile variables are generally welcome - as long as | 
 | 102 | they come with a justification which shows that the concurrency issues have | 
 | 103 | been properly thought through. | 
 | 104 |  | 
 | 105 |  | 
 | 106 | NOTES | 
 | 107 | ----- | 
 | 108 |  | 
 | 109 | [1] http://lwn.net/Articles/233481/ | 
 | 110 | [2] http://lwn.net/Articles/233482/ | 
 | 111 |  | 
 | 112 | CREDITS | 
 | 113 | ------- | 
 | 114 |  | 
 | 115 | Original impetus and research by Randy Dunlap | 
 | 116 | Written by Jonathan Corbet | 
| Matt LaPlante | d919588 | 2008-07-25 19:45:33 -0700 | [diff] [blame] | 117 | Improvements via comments from Satyam Sharma, Johannes Stezenbach, Jesper | 
| Jonathan Corbet | 0faa454 | 2007-06-23 17:16:41 -0700 | [diff] [blame] | 118 | 	Juhl, Heikki Orsila, H. Peter Anvin, Philipp Hahn, and Stefan | 
 | 119 | 	Richter. |