| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1 |  | 
|  | 2 | PCI Power Management | 
|  | 3 | ~~~~~~~~~~~~~~~~~~~~ | 
|  | 4 |  | 
|  | 5 | An overview of the concepts and the related functions in the Linux kernel | 
|  | 6 |  | 
|  | 7 | Patrick Mochel <mochel@transmeta.com> | 
|  | 8 | (and others) | 
|  | 9 |  | 
|  | 10 | --------------------------------------------------------------------------- | 
|  | 11 |  | 
|  | 12 | 1. Overview | 
|  | 13 | 2. How the PCI Subsystem Does Power Management | 
|  | 14 | 3. PCI Utility Functions | 
|  | 15 | 4. PCI Device Drivers | 
|  | 16 | 5. Resources | 
|  | 17 |  | 
|  | 18 | 1. Overview | 
|  | 19 | ~~~~~~~~~~~ | 
|  | 20 |  | 
|  | 21 | The PCI Power Management Specification was introduced between the PCI 2.1 and | 
|  | 22 | PCI 2.2 Specifications. It a standard interface for controlling various | 
|  | 23 | power management operations. | 
|  | 24 |  | 
|  | 25 | Implementation of the PCI PM Spec is optional, as are several sub-components of | 
|  | 26 | it. If a device supports the PCI PM Spec, the device will have an 8 byte | 
|  | 27 | capability field in its PCI configuration space. This field is used to describe | 
|  | 28 | and control the standard PCI power management features. | 
|  | 29 |  | 
|  | 30 | The PCI PM spec defines 4 operating states for devices (D0 - D3) and for buses | 
|  | 31 | (B0 - B3). The higher the number, the less power the device consumes. However, | 
|  | 32 | the higher the number, the longer the latency is for the device to return to | 
|  | 33 | an operational state (D0). | 
|  | 34 |  | 
|  | 35 | There are actually two D3 states.  When someone talks about D3, they usually | 
|  | 36 | mean D3hot, which corresponds to an ACPI D2 state (power is reduced, the | 
|  | 37 | device may lose some context).  But they may also mean D3cold, which is an | 
|  | 38 | ACPI D3 state (power is fully off, all state was discarded); or both. | 
|  | 39 |  | 
|  | 40 | Bus power management is not covered in this version of this document. | 
|  | 41 |  | 
|  | 42 | Note that all PCI devices support D0 and D3cold by default, regardless of | 
|  | 43 | whether or not they implement any of the PCI PM spec. | 
|  | 44 |  | 
|  | 45 | The possible state transitions that a device can undergo are: | 
|  | 46 |  | 
|  | 47 | +---------------------------+ | 
|  | 48 | | Current State | New State | | 
|  | 49 | +---------------------------+ | 
|  | 50 | | D0            | D1, D2, D3| | 
|  | 51 | +---------------------------+ | 
|  | 52 | | D1            | D2, D3    | | 
|  | 53 | +---------------------------+ | 
|  | 54 | | D2            | D3        | | 
|  | 55 | +---------------------------+ | 
|  | 56 | | D1, D2, D3    | D0        | | 
|  | 57 | +---------------------------+ | 
|  | 58 |  | 
|  | 59 | Note that when the system is entering a global suspend state, all devices will | 
|  | 60 | be placed into D3 and when resuming, all devices will be placed into D0. | 
|  | 61 | However, when the system is running, other state transitions are possible. | 
|  | 62 |  | 
|  | 63 | 2. How The PCI Subsystem Handles Power Management | 
|  | 64 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 
|  | 65 |  | 
|  | 66 | The PCI suspend/resume functionality is accessed indirectly via the Power | 
|  | 67 | Management subsystem. At boot, the PCI driver registers a power management | 
|  | 68 | callback with that layer. Upon entering a suspend state, the PM layer iterates | 
|  | 69 | through all of its registered callbacks. This currently takes place only during | 
|  | 70 | APM state transitions. | 
|  | 71 |  | 
|  | 72 | Upon going to sleep, the PCI subsystem walks its device tree twice. Both times, | 
|  | 73 | it does a depth first walk of the device tree. The first walk saves each of the | 
|  | 74 | device's state and checks for devices that will prevent the system from entering | 
|  | 75 | a global power state. The next walk then places the devices in a low power | 
|  | 76 | state. | 
|  | 77 |  | 
|  | 78 | The first walk allows a graceful recovery in the event of a failure, since none | 
|  | 79 | of the devices have actually been powered down. | 
|  | 80 |  | 
|  | 81 | In both walks, in particular the second, all children of a bridge are touched | 
|  | 82 | before the actual bridge itself. This allows the bridge to retain power while | 
|  | 83 | its children are being accessed. | 
|  | 84 |  | 
|  | 85 | Upon resuming from sleep, just the opposite must be true: all bridges must be | 
|  | 86 | powered on and restored before their children are powered on. This is easily | 
|  | 87 | accomplished with a breadth-first walk of the PCI device tree. | 
|  | 88 |  | 
|  | 89 |  | 
|  | 90 | 3. PCI Utility Functions | 
|  | 91 | ~~~~~~~~~~~~~~~~~~~~~~~~ | 
|  | 92 |  | 
|  | 93 | These are helper functions designed to be called by individual device drivers. | 
|  | 94 | Assuming that a device behaves as advertised, these should be applicable in most | 
|  | 95 | cases. However, results may vary. | 
|  | 96 |  | 
|  | 97 | Note that these functions are never implicitly called for the driver. The driver | 
|  | 98 | is always responsible for deciding when and if to call these. | 
|  | 99 |  | 
|  | 100 |  | 
|  | 101 | pci_save_state | 
|  | 102 | -------------- | 
|  | 103 |  | 
|  | 104 | Usage: | 
| Jonathan Corbet | 5fabdb9 | 2007-03-22 16:53:40 -0600 | [diff] [blame] | 105 | pci_save_state(struct pci_dev *dev); | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 106 |  | 
|  | 107 | Description: | 
| Jonathan Corbet | 5fabdb9 | 2007-03-22 16:53:40 -0600 | [diff] [blame] | 108 | Save first 64 bytes of PCI config space, along with any additional | 
|  | 109 | PCI-Express or PCI-X information. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 110 |  | 
|  | 111 |  | 
|  | 112 | pci_restore_state | 
|  | 113 | ----------------- | 
|  | 114 |  | 
|  | 115 | Usage: | 
| Jonathan Corbet | 5fabdb9 | 2007-03-22 16:53:40 -0600 | [diff] [blame] | 116 | pci_restore_state(struct pci_dev *dev); | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 117 |  | 
|  | 118 | Description: | 
| Jonathan Corbet | 5fabdb9 | 2007-03-22 16:53:40 -0600 | [diff] [blame] | 119 | Restore previously saved config space. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 120 |  | 
|  | 121 |  | 
|  | 122 | pci_set_power_state | 
|  | 123 | ------------------- | 
|  | 124 |  | 
|  | 125 | Usage: | 
| Jonathan Corbet | 5fabdb9 | 2007-03-22 16:53:40 -0600 | [diff] [blame] | 126 | pci_set_power_state(struct pci_dev *dev, pci_power_t state); | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 127 |  | 
|  | 128 | Description: | 
|  | 129 | Transition device to low power state using PCI PM Capabilities | 
|  | 130 | registers. | 
|  | 131 |  | 
|  | 132 | Will fail under one of the following conditions: | 
|  | 133 | - If state is less than current state, but not D0 (illegal transition) | 
|  | 134 | - Device doesn't support PM Capabilities | 
|  | 135 | - Device does not support requested state | 
|  | 136 |  | 
|  | 137 |  | 
|  | 138 | pci_enable_wake | 
|  | 139 | --------------- | 
|  | 140 |  | 
|  | 141 | Usage: | 
| Jonathan Corbet | 5fabdb9 | 2007-03-22 16:53:40 -0600 | [diff] [blame] | 142 | pci_enable_wake(struct pci_dev *dev, pci_power_t state, int enable); | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 143 |  | 
|  | 144 | Description: | 
|  | 145 | Enable device to generate PME# during low power state using PCI PM | 
|  | 146 | Capabilities. | 
|  | 147 |  | 
|  | 148 | Checks whether if device supports generating PME# from requested state | 
|  | 149 | and fail if it does not, unless enable == 0 (request is to disable wake | 
|  | 150 | events, which is implicit if it doesn't even support it in the first | 
|  | 151 | place). | 
|  | 152 |  | 
| Matt LaPlante | 5d3f083 | 2006-11-30 05:21:10 +0100 | [diff] [blame] | 153 | Note that the PMC Register in the device's PM Capabilities has a bitmask | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 154 | of the states it supports generating PME# from. D3hot is bit 3 and | 
|  | 155 | D3cold is bit 4. So, while a value of 4 as the state may not seem | 
|  | 156 | semantically correct, it is. | 
|  | 157 |  | 
|  | 158 |  | 
|  | 159 | 4. PCI Device Drivers | 
|  | 160 | ~~~~~~~~~~~~~~~~~~~~~ | 
|  | 161 |  | 
|  | 162 | These functions are intended for use by individual drivers, and are defined in | 
|  | 163 | struct pci_driver: | 
|  | 164 |  | 
| Pavel Machek | 92df516 | 2005-04-05 23:49:49 +0200 | [diff] [blame] | 165 | int  (*suspend) (struct pci_dev *dev, pm_message_t state); | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 166 | int  (*resume) (struct pci_dev *dev); | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 167 |  | 
|  | 168 |  | 
|  | 169 | suspend | 
|  | 170 | ------- | 
|  | 171 |  | 
|  | 172 | Usage: | 
|  | 173 |  | 
|  | 174 | if (dev->driver && dev->driver->suspend) | 
|  | 175 | dev->driver->suspend(dev,state); | 
|  | 176 |  | 
|  | 177 | A driver uses this function to actually transition the device into a low power | 
|  | 178 | state. This should include disabling I/O, IRQs, and bus-mastering, as well as | 
|  | 179 | physically transitioning the device to a lower power state; it may also include | 
|  | 180 | calls to pci_enable_wake(). | 
|  | 181 |  | 
|  | 182 | Bus mastering may be disabled by doing: | 
|  | 183 |  | 
|  | 184 | pci_disable_device(dev); | 
|  | 185 |  | 
|  | 186 | For devices that support the PCI PM Spec, this may be used to set the device's | 
|  | 187 | power state to match the suspend() parameter: | 
|  | 188 |  | 
|  | 189 | pci_set_power_state(dev,state); | 
|  | 190 |  | 
|  | 191 | The driver is also responsible for disabling any other device-specific features | 
|  | 192 | (e.g blanking screen, turning off on-card memory, etc). | 
|  | 193 |  | 
|  | 194 | The driver should be sure to track the current state of the device, as it may | 
|  | 195 | obviate the need for some operations. | 
|  | 196 |  | 
|  | 197 | The driver should update the current_state field in its pci_dev structure in | 
|  | 198 | this function, except for PM-capable devices when pci_set_power_state is used. | 
|  | 199 |  | 
|  | 200 | resume | 
|  | 201 | ------ | 
|  | 202 |  | 
|  | 203 | Usage: | 
|  | 204 |  | 
| Randy Dunlap | 54eee4c | 2007-04-04 21:35:39 -0700 | [diff] [blame] | 205 | if (dev->driver && dev->driver->resume) | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 206 | dev->driver->resume(dev) | 
|  | 207 |  | 
|  | 208 | The resume callback may be called from any power state, and is always meant to | 
|  | 209 | transition the device to the D0 state. | 
|  | 210 |  | 
|  | 211 | The driver is responsible for reenabling any features of the device that had | 
|  | 212 | been disabled during previous suspend calls, such as IRQs and bus mastering, | 
|  | 213 | as well as calling pci_restore_state(). | 
|  | 214 |  | 
|  | 215 | If the device is currently in D3, it may need to be reinitialized in resume(). | 
|  | 216 |  | 
|  | 217 | * Some types of devices, like bus controllers, will preserve context in D3hot | 
|  | 218 | (using Vcc power).  Their drivers will often want to avoid re-initializing | 
|  | 219 | them after re-entering D0 (perhaps to avoid resetting downstream devices). | 
|  | 220 |  | 
|  | 221 | * Other kinds of devices in D3hot will discard device context as part of a | 
|  | 222 | soft reset when re-entering the D0 state. | 
|  | 223 |  | 
|  | 224 | * Devices resuming from D3cold always go through a power-on reset.  Some | 
|  | 225 | device context can also be preserved using Vaux power. | 
|  | 226 |  | 
|  | 227 | * Some systems hide D3cold resume paths from drivers.  For example, on PCs | 
|  | 228 | the resume path for suspend-to-disk often runs BIOS powerup code, which | 
|  | 229 | will sometimes re-initialize the device. | 
|  | 230 |  | 
|  | 231 | To handle resets during D3 to D0 transitions, it may be convenient to share | 
|  | 232 | device initialization code between probe() and resume().  Device parameters | 
|  | 233 | can also be saved before the driver suspends into D3, avoiding re-probe. | 
|  | 234 |  | 
|  | 235 | If the device supports the PCI PM Spec, it can use this to physically transition | 
|  | 236 | the device to D0: | 
|  | 237 |  | 
|  | 238 | pci_set_power_state(dev,0); | 
|  | 239 |  | 
|  | 240 | Note that if the entire system is transitioning out of a global sleep state, all | 
|  | 241 | devices will be placed in the D0 state, so this is not necessary. However, in | 
|  | 242 | the event that the device is placed in the D3 state during normal operation, | 
|  | 243 | this call is necessary. It is impossible to determine which of the two events is | 
|  | 244 | taking place in the driver, so it is always a good idea to make that call. | 
|  | 245 |  | 
|  | 246 | The driver should take note of the state that it is resuming from in order to | 
|  | 247 | ensure correct (and speedy) operation. | 
|  | 248 |  | 
|  | 249 | The driver should update the current_state field in its pci_dev structure in | 
|  | 250 | this function, except for PM-capable devices when pci_set_power_state is used. | 
|  | 251 |  | 
|  | 252 |  | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 253 |  | 
| pavel@ucw.cz | 21d6b7e | 2005-06-25 14:55:16 -0700 | [diff] [blame] | 254 | A reference implementation | 
|  | 255 | ------------------------- | 
|  | 256 | .suspend() | 
|  | 257 | { | 
|  | 258 | /* driver specific operations */ | 
|  | 259 |  | 
|  | 260 | /* Disable IRQ */ | 
|  | 261 | free_irq(); | 
|  | 262 | /* If using MSI */ | 
|  | 263 | pci_disable_msi(); | 
|  | 264 |  | 
|  | 265 | pci_save_state(); | 
|  | 266 | pci_enable_wake(); | 
|  | 267 | /* Disable IO/bus master/irq router */ | 
|  | 268 | pci_disable_device(); | 
|  | 269 | pci_set_power_state(pci_choose_state()); | 
|  | 270 | } | 
|  | 271 |  | 
|  | 272 | .resume() | 
|  | 273 | { | 
|  | 274 | pci_set_power_state(PCI_D0); | 
|  | 275 | pci_restore_state(); | 
|  | 276 | /* device's irq possibly is changed, driver should take care */ | 
|  | 277 | pci_enable_device(); | 
|  | 278 | pci_set_master(); | 
|  | 279 |  | 
|  | 280 | /* if using MSI, device's vector possibly is changed */ | 
|  | 281 | pci_enable_msi(); | 
|  | 282 |  | 
|  | 283 | request_irq(); | 
|  | 284 | /* driver specific operations; */ | 
|  | 285 | } | 
|  | 286 |  | 
|  | 287 | This is a typical implementation. Drivers can slightly change the order | 
|  | 288 | of the operations in the implementation, ignore some operations or add | 
| Matt LaPlante | fff9289 | 2006-10-03 22:47:42 +0200 | [diff] [blame] | 289 | more driver specific operations in it, but drivers should do something like | 
| pavel@ucw.cz | 21d6b7e | 2005-06-25 14:55:16 -0700 | [diff] [blame] | 290 | this on the whole. | 
|  | 291 |  | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 292 | 5. Resources | 
|  | 293 | ~~~~~~~~~~~~ | 
|  | 294 |  | 
|  | 295 | PCI Local Bus Specification | 
|  | 296 | PCI Bus Power Management Interface Specification | 
|  | 297 |  | 
| Randy Dunlap | 98766fb | 2005-11-21 21:32:31 -0800 | [diff] [blame] | 298 | http://www.pcisig.com | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 299 |  |