| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1 | PCI Power Management | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 2 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 3 | Copyright (c) 2010 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 4 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 5 | An overview of concepts and the Linux kernel's interfaces related to PCI power | 
|  | 6 | management.  Based on previous work by Patrick Mochel <mochel@transmeta.com> | 
|  | 7 | (and others). | 
|  | 8 |  | 
|  | 9 | This document only covers the aspects of power management specific to PCI | 
|  | 10 | devices.  For general description of the kernel's interfaces related to device | 
|  | 11 | power management refer to Documentation/power/devices.txt and | 
|  | 12 | Documentation/power/runtime_pm.txt. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 13 |  | 
|  | 14 | --------------------------------------------------------------------------- | 
|  | 15 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 16 | 1. Hardware and Platform Support for PCI Power Management | 
|  | 17 | 2. PCI Subsystem and Device Power Management | 
|  | 18 | 3. PCI Device Drivers and Power Management | 
|  | 19 | 4. Resources | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 20 |  | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 21 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 22 | 1. Hardware and Platform Support for PCI Power Management | 
|  | 23 | ========================================================= | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 24 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 25 | 1.1. Native and Platform-Based Power Management | 
|  | 26 | ----------------------------------------------- | 
|  | 27 | In general, power management is a feature allowing one to save energy by putting | 
|  | 28 | devices into states in which they draw less power (low-power states) at the | 
|  | 29 | price of reduced functionality or performance. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 30 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 31 | Usually, a device is put into a low-power state when it is underutilized or | 
|  | 32 | completely inactive.  However, when it is necessary to use the device once | 
|  | 33 | again, it has to be put back into the "fully functional" state (full-power | 
|  | 34 | state).  This may happen when there are some data for the device to handle or | 
|  | 35 | as a result of an external event requiring the device to be active, which may | 
|  | 36 | be signaled by the device itself. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 37 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 38 | PCI devices may be put into low-power states in two ways, by using the device | 
|  | 39 | capabilities introduced by the PCI Bus Power Management Interface Specification, | 
|  | 40 | or with the help of platform firmware, such as an ACPI BIOS.  In the first | 
|  | 41 | approach, that is referred to as the native PCI power management (native PCI PM) | 
|  | 42 | in what follows, the device power state is changed as a result of writing a | 
|  | 43 | specific value into one of its standard configuration registers.  The second | 
|  | 44 | approach requires the platform firmware to provide special methods that may be | 
|  | 45 | used by the kernel to change the device's power state. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 46 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 47 | Devices supporting the native PCI PM usually can generate wakeup signals called | 
|  | 48 | Power Management Events (PMEs) to let the kernel know about external events | 
|  | 49 | requiring the device to be active.  After receiving a PME the kernel is supposed | 
|  | 50 | to put the device that sent it into the full-power state.  However, the PCI Bus | 
|  | 51 | Power Management Interface Specification doesn't define any standard method of | 
|  | 52 | delivering the PME from the device to the CPU and the operating system kernel. | 
|  | 53 | It is assumed that the platform firmware will perform this task and therefore, | 
|  | 54 | even though a PCI device is set up to generate PMEs, it also may be necessary to | 
|  | 55 | prepare the platform firmware for notifying the CPU of the PMEs coming from the | 
|  | 56 | device (e.g. by generating interrupts). | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 57 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 58 | In turn, if the methods provided by the platform firmware are used for changing | 
|  | 59 | the power state of a device, usually the platform also provides a method for | 
|  | 60 | preparing the device to generate wakeup signals.  In that case, however, it | 
|  | 61 | often also is necessary to prepare the device for generating PMEs using the | 
|  | 62 | native PCI PM mechanism, because the method provided by the platform depends on | 
|  | 63 | that. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 64 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 65 | Thus in many situations both the native and the platform-based power management | 
|  | 66 | mechanisms have to be used simultaneously to obtain the desired result. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 67 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 68 | 1.2. Native PCI Power Management | 
|  | 69 | -------------------------------- | 
|  | 70 | The PCI Bus Power Management Interface Specification (PCI PM Spec) was | 
|  | 71 | introduced between the PCI 2.1 and PCI 2.2 Specifications.  It defined a | 
|  | 72 | standard interface for performing various operations related to power | 
|  | 73 | management. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 74 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 75 | The implementation of the PCI PM Spec is optional for conventional PCI devices, | 
|  | 76 | but it is mandatory for PCI Express devices.  If a device supports the PCI PM | 
|  | 77 | Spec, it has an 8 byte power management capability field in its PCI | 
|  | 78 | configuration space.  This field is used to describe and control the standard | 
|  | 79 | features related to the native PCI power management. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 80 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 81 | The PCI PM Spec defines 4 operating states for devices (D0-D3) and for buses | 
|  | 82 | (B0-B3).  The higher the number, the less power is drawn by the device or bus | 
|  | 83 | in that state.  However, the higher the number, the longer the latency for | 
|  | 84 | the device or bus to return to the full-power state (D0 or B0, respectively). | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 85 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 86 | There are two variants of the D3 state defined by the specification.  The first | 
|  | 87 | one is D3hot, referred to as the software accessible D3, because devices can be | 
|  | 88 | programmed to go into it.  The second one, D3cold, is the state that PCI devices | 
|  | 89 | are in when the supply voltage (Vcc) is removed from them.  It is not possible | 
|  | 90 | to program a PCI device to go into D3cold, although there may be a programmable | 
|  | 91 | interface for putting the bus the device is on into a state in which Vcc is | 
|  | 92 | removed from all devices on the bus. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 93 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 94 | PCI bus power management, however, is not supported by the Linux kernel at the | 
|  | 95 | time of this writing and therefore it is not covered by this document. | 
|  | 96 |  | 
|  | 97 | Note that every PCI device can be in the full-power state (D0) or in D3cold, | 
|  | 98 | regardless of whether or not it implements the PCI PM Spec.  In addition to | 
|  | 99 | that, if the PCI PM Spec is implemented by the device, it must support D3hot | 
|  | 100 | as well as D0.  The support for the D1 and D2 power states is optional. | 
|  | 101 |  | 
|  | 102 | PCI devices supporting the PCI PM Spec can be programmed to go to any of the | 
|  | 103 | supported low-power states (except for D3cold).  While in D1-D3hot the | 
|  | 104 | standard configuration registers of the device must be accessible to software | 
|  | 105 | (i.e. the device is required to respond to PCI configuration accesses), although | 
|  | 106 | its I/O and memory spaces are then disabled.  This allows the device to be | 
|  | 107 | programmatically put into D0.  Thus the kernel can switch the device back and | 
|  | 108 | forth between D0 and the supported low-power states (except for D3cold) and the | 
|  | 109 | possible power state transitions the device can undergo are the following: | 
|  | 110 |  | 
|  | 111 | +----------------------------+ | 
|  | 112 | | Current State | New State  | | 
|  | 113 | +----------------------------+ | 
|  | 114 | | D0            | D1, D2, D3 | | 
|  | 115 | +----------------------------+ | 
|  | 116 | | D1            | D2, D3     | | 
|  | 117 | +----------------------------+ | 
|  | 118 | | D2            | D3         | | 
|  | 119 | +----------------------------+ | 
|  | 120 | | D1, D2, D3    | D0         | | 
|  | 121 | +----------------------------+ | 
|  | 122 |  | 
|  | 123 | The transition from D3cold to D0 occurs when the supply voltage is provided to | 
|  | 124 | the device (i.e. power is restored).  In that case the device returns to D0 with | 
|  | 125 | a full power-on reset sequence and the power-on defaults are restored to the | 
|  | 126 | device by hardware just as at initial power up. | 
|  | 127 |  | 
|  | 128 | PCI devices supporting the PCI PM Spec can be programmed to generate PMEs | 
|  | 129 | while in a low-power state (D1-D3), but they are not required to be capable | 
|  | 130 | of generating PMEs from all supported low-power states.  In particular, the | 
|  | 131 | capability of generating PMEs from D3cold is optional and depends on the | 
|  | 132 | presence of additional voltage (3.3Vaux) allowing the device to remain | 
|  | 133 | sufficiently active to generate a wakeup signal. | 
|  | 134 |  | 
|  | 135 | 1.3. ACPI Device Power Management | 
|  | 136 | --------------------------------- | 
|  | 137 | The platform firmware support for the power management of PCI devices is | 
|  | 138 | system-specific.  However, if the system in question is compliant with the | 
|  | 139 | Advanced Configuration and Power Interface (ACPI) Specification, like the | 
|  | 140 | majority of x86-based systems, it is supposed to implement device power | 
|  | 141 | management interfaces defined by the ACPI standard. | 
|  | 142 |  | 
|  | 143 | For this purpose the ACPI BIOS provides special functions called "control | 
|  | 144 | methods" that may be executed by the kernel to perform specific tasks, such as | 
|  | 145 | putting a device into a low-power state.  These control methods are encoded | 
|  | 146 | using special byte-code language called the ACPI Machine Language (AML) and | 
|  | 147 | stored in the machine's BIOS.  The kernel loads them from the BIOS and executes | 
|  | 148 | them as needed using an AML interpreter that translates the AML byte code into | 
|  | 149 | computations and memory or I/O space accesses.  This way, in theory, a BIOS | 
|  | 150 | writer can provide the kernel with a means to perform actions depending | 
|  | 151 | on the system design in a system-specific fashion. | 
|  | 152 |  | 
|  | 153 | ACPI control methods may be divided into global control methods, that are not | 
|  | 154 | associated with any particular devices, and device control methods, that have | 
|  | 155 | to be defined separately for each device supposed to be handled with the help of | 
|  | 156 | the platform.  This means, in particular, that ACPI device control methods can | 
|  | 157 | only be used to handle devices that the BIOS writer knew about in advance.  The | 
|  | 158 | ACPI methods used for device power management fall into that category. | 
|  | 159 |  | 
|  | 160 | The ACPI specification assumes that devices can be in one of four power states | 
|  | 161 | labeled as D0, D1, D2, and D3 that roughly correspond to the native PCI PM | 
|  | 162 | D0-D3 states (although the difference between D3hot and D3cold is not taken | 
|  | 163 | into account by ACPI).  Moreover, for each power state of a device there is a | 
|  | 164 | set of power resources that have to be enabled for the device to be put into | 
|  | 165 | that state.  These power resources are controlled (i.e. enabled or disabled) | 
|  | 166 | with the help of their own control methods, _ON and _OFF, that have to be | 
|  | 167 | defined individually for each of them. | 
|  | 168 |  | 
|  | 169 | To put a device into the ACPI power state Dx (where x is a number between 0 and | 
|  | 170 | 3 inclusive) the kernel is supposed to (1) enable the power resources required | 
|  | 171 | by the device in this state using their _ON control methods and (2) execute the | 
|  | 172 | _PSx control method defined for the device.  In addition to that, if the device | 
|  | 173 | is going to be put into a low-power state (D1-D3) and is supposed to generate | 
|  | 174 | wakeup signals from that state, the _DSW (or _PSW, replaced with _DSW by ACPI | 
|  | 175 | 3.0) control method defined for it has to be executed before _PSx.  Power | 
|  | 176 | resources that are not required by the device in the target power state and are | 
|  | 177 | not required any more by any other device should be disabled (by executing their | 
|  | 178 | _OFF control methods).  If the current power state of the device is D3, it can | 
|  | 179 | only be put into D0 this way. | 
|  | 180 |  | 
|  | 181 | However, quite often the power states of devices are changed during a | 
|  | 182 | system-wide transition into a sleep state or back into the working state.  ACPI | 
|  | 183 | defines four system sleep states, S1, S2, S3, and S4, and denotes the system | 
|  | 184 | working state as S0.  In general, the target system sleep (or working) state | 
|  | 185 | determines the highest power (lowest number) state the device can be put | 
|  | 186 | into and the kernel is supposed to obtain this information by executing the | 
|  | 187 | device's _SxD control method (where x is a number between 0 and 4 inclusive). | 
|  | 188 | If the device is required to wake up the system from the target sleep state, the | 
|  | 189 | lowest power (highest number) state it can be put into is also determined by the | 
|  | 190 | target state of the system.  The kernel is then supposed to use the device's | 
|  | 191 | _SxW control method to obtain the number of that state.  It also is supposed to | 
|  | 192 | use the device's _PRW control method to learn which power resources need to be | 
|  | 193 | enabled for the device to be able to generate wakeup signals. | 
|  | 194 |  | 
|  | 195 | 1.4. Wakeup Signaling | 
|  | 196 | --------------------- | 
|  | 197 | Wakeup signals generated by PCI devices, either as native PCI PMEs, or as | 
|  | 198 | a result of the execution of the _DSW (or _PSW) ACPI control method before | 
|  | 199 | putting the device into a low-power state, have to be caught and handled as | 
|  | 200 | appropriate.  If they are sent while the system is in the working state | 
|  | 201 | (ACPI S0), they should be translated into interrupts so that the kernel can | 
|  | 202 | put the devices generating them into the full-power state and take care of the | 
|  | 203 | events that triggered them.  In turn, if they are sent while the system is | 
|  | 204 | sleeping, they should cause the system's core logic to trigger wakeup. | 
|  | 205 |  | 
|  | 206 | On ACPI-based systems wakeup signals sent by conventional PCI devices are | 
|  | 207 | converted into ACPI General-Purpose Events (GPEs) which are hardware signals | 
|  | 208 | from the system core logic generated in response to various events that need to | 
|  | 209 | be acted upon.  Every GPE is associated with one or more sources of potentially | 
|  | 210 | interesting events.  In particular, a GPE may be associated with a PCI device | 
|  | 211 | capable of signaling wakeup.  The information on the connections between GPEs | 
|  | 212 | and event sources is recorded in the system's ACPI BIOS from where it can be | 
|  | 213 | read by the kernel. | 
|  | 214 |  | 
|  | 215 | If a PCI device known to the system's ACPI BIOS signals wakeup, the GPE | 
|  | 216 | associated with it (if there is one) is triggered.  The GPEs associated with PCI | 
|  | 217 | bridges may also be triggered in response to a wakeup signal from one of the | 
|  | 218 | devices below the bridge (this also is the case for root bridges) and, for | 
|  | 219 | example, native PCI PMEs from devices unknown to the system's ACPI BIOS may be | 
|  | 220 | handled this way. | 
|  | 221 |  | 
|  | 222 | A GPE may be triggered when the system is sleeping (i.e. when it is in one of | 
|  | 223 | the ACPI S1-S4 states), in which case system wakeup is started by its core logic | 
|  | 224 | (the device that was the source of the signal causing the system wakeup to occur | 
|  | 225 | may be identified later).  The GPEs used in such situations are referred to as | 
|  | 226 | wakeup GPEs. | 
|  | 227 |  | 
|  | 228 | Usually, however, GPEs are also triggered when the system is in the working | 
|  | 229 | state (ACPI S0) and in that case the system's core logic generates a System | 
|  | 230 | Control Interrupt (SCI) to notify the kernel of the event.  Then, the SCI | 
|  | 231 | handler identifies the GPE that caused the interrupt to be generated which, | 
|  | 232 | in turn, allows the kernel to identify the source of the event (that may be | 
|  | 233 | a PCI device signaling wakeup).  The GPEs used for notifying the kernel of | 
|  | 234 | events occurring while the system is in the working state are referred to as | 
|  | 235 | runtime GPEs. | 
|  | 236 |  | 
|  | 237 | Unfortunately, there is no standard way of handling wakeup signals sent by | 
|  | 238 | conventional PCI devices on systems that are not ACPI-based, but there is one | 
|  | 239 | for PCI Express devices.  Namely, the PCI Express Base Specification introduced | 
|  | 240 | a native mechanism for converting native PCI PMEs into interrupts generated by | 
|  | 241 | root ports.  For conventional PCI devices native PMEs are out-of-band, so they | 
|  | 242 | are routed separately and they need not pass through bridges (in principle they | 
|  | 243 | may be routed directly to the system's core logic), but for PCI Express devices | 
|  | 244 | they are in-band messages that have to pass through the PCI Express hierarchy, | 
|  | 245 | including the root port on the path from the device to the Root Complex.  Thus | 
|  | 246 | it was possible to introduce a mechanism by which a root port generates an | 
|  | 247 | interrupt whenever it receives a PME message from one of the devices below it. | 
|  | 248 | The PCI Express Requester ID of the device that sent the PME message is then | 
|  | 249 | recorded in one of the root port's configuration registers from where it may be | 
|  | 250 | read by the interrupt handler allowing the device to be identified.  [PME | 
|  | 251 | messages sent by PCI Express endpoints integrated with the Root Complex don't | 
|  | 252 | pass through root ports, but instead they cause a Root Complex Event Collector | 
|  | 253 | (if there is one) to generate interrupts.] | 
|  | 254 |  | 
|  | 255 | In principle the native PCI Express PME signaling may also be used on ACPI-based | 
|  | 256 | systems along with the GPEs, but to use it the kernel has to ask the system's | 
|  | 257 | ACPI BIOS to release control of root port configuration registers.  The ACPI | 
|  | 258 | BIOS, however, is not required to allow the kernel to control these registers | 
|  | 259 | and if it doesn't do that, the kernel must not modify their contents.  Of course | 
|  | 260 | the native PCI Express PME signaling cannot be used by the kernel in that case. | 
|  | 261 |  | 
|  | 262 |  | 
|  | 263 | 2. PCI Subsystem and Device Power Management | 
|  | 264 | ============================================ | 
|  | 265 |  | 
|  | 266 | 2.1. Device Power Management Callbacks | 
|  | 267 | -------------------------------------- | 
|  | 268 | The PCI Subsystem participates in the power management of PCI devices in a | 
|  | 269 | number of ways.  First of all, it provides an intermediate code layer between | 
|  | 270 | the device power management core (PM core) and PCI device drivers. | 
|  | 271 | Specifically, the pm field of the PCI subsystem's struct bus_type object, | 
|  | 272 | pci_bus_type, points to a struct dev_pm_ops object, pci_dev_pm_ops, containing | 
|  | 273 | pointers to several device power management callbacks: | 
|  | 274 |  | 
|  | 275 | const struct dev_pm_ops pci_dev_pm_ops = { | 
|  | 276 | .prepare = pci_pm_prepare, | 
|  | 277 | .complete = pci_pm_complete, | 
|  | 278 | .suspend = pci_pm_suspend, | 
|  | 279 | .resume = pci_pm_resume, | 
|  | 280 | .freeze = pci_pm_freeze, | 
|  | 281 | .thaw = pci_pm_thaw, | 
|  | 282 | .poweroff = pci_pm_poweroff, | 
|  | 283 | .restore = pci_pm_restore, | 
|  | 284 | .suspend_noirq = pci_pm_suspend_noirq, | 
|  | 285 | .resume_noirq = pci_pm_resume_noirq, | 
|  | 286 | .freeze_noirq = pci_pm_freeze_noirq, | 
|  | 287 | .thaw_noirq = pci_pm_thaw_noirq, | 
|  | 288 | .poweroff_noirq = pci_pm_poweroff_noirq, | 
|  | 289 | .restore_noirq = pci_pm_restore_noirq, | 
|  | 290 | .runtime_suspend = pci_pm_runtime_suspend, | 
|  | 291 | .runtime_resume = pci_pm_runtime_resume, | 
|  | 292 | .runtime_idle = pci_pm_runtime_idle, | 
|  | 293 | }; | 
|  | 294 |  | 
|  | 295 | These callbacks are executed by the PM core in various situations related to | 
|  | 296 | device power management and they, in turn, execute power management callbacks | 
|  | 297 | provided by PCI device drivers.  They also perform power management operations | 
|  | 298 | involving some standard configuration registers of PCI devices that device | 
|  | 299 | drivers need not know or care about. | 
|  | 300 |  | 
|  | 301 | The structure representing a PCI device, struct pci_dev, contains several fields | 
|  | 302 | that these callbacks operate on: | 
|  | 303 |  | 
|  | 304 | struct pci_dev { | 
|  | 305 | ... | 
|  | 306 | pci_power_t     current_state;  /* Current operating state. */ | 
|  | 307 | int		pm_cap;		/* PM capability offset in the | 
|  | 308 | configuration space */ | 
|  | 309 | unsigned int	pme_support:5;	/* Bitmask of states from which PME# | 
|  | 310 | can be generated */ | 
|  | 311 | unsigned int	pme_interrupt:1;/* Is native PCIe PME signaling used? */ | 
|  | 312 | unsigned int	d1_support:1;	/* Low power state D1 is supported */ | 
|  | 313 | unsigned int	d2_support:1;	/* Low power state D2 is supported */ | 
|  | 314 | unsigned int	no_d1d2:1;	/* D1 and D2 are forbidden */ | 
|  | 315 | unsigned int	wakeup_prepared:1;  /* Device prepared for wake up */ | 
|  | 316 | unsigned int	d3_delay;	/* D3->D0 transition time in ms */ | 
|  | 317 | ... | 
|  | 318 | }; | 
|  | 319 |  | 
|  | 320 | They also indirectly use some fields of the struct device that is embedded in | 
|  | 321 | struct pci_dev. | 
|  | 322 |  | 
|  | 323 | 2.2. Device Initialization | 
|  | 324 | -------------------------- | 
|  | 325 | The PCI subsystem's first task related to device power management is to | 
|  | 326 | prepare the device for power management and initialize the fields of struct | 
|  | 327 | pci_dev used for this purpose.  This happens in two functions defined in | 
|  | 328 | drivers/pci/pci.c, pci_pm_init() and platform_pci_wakeup_init(). | 
|  | 329 |  | 
|  | 330 | The first of these functions checks if the device supports native PCI PM | 
|  | 331 | and if that's the case the offset of its power management capability structure | 
|  | 332 | in the configuration space is stored in the pm_cap field of the device's struct | 
|  | 333 | pci_dev object.  Next, the function checks which PCI low-power states are | 
|  | 334 | supported by the device and from which low-power states the device can generate | 
|  | 335 | native PCI PMEs.  The power management fields of the device's struct pci_dev and | 
|  | 336 | the struct device embedded in it are updated accordingly and the generation of | 
|  | 337 | PMEs by the device is disabled. | 
|  | 338 |  | 
|  | 339 | The second function checks if the device can be prepared to signal wakeup with | 
|  | 340 | the help of the platform firmware, such as the ACPI BIOS.  If that is the case, | 
|  | 341 | the function updates the wakeup fields in struct device embedded in the | 
|  | 342 | device's struct pci_dev and uses the firmware-provided method to prevent the | 
|  | 343 | device from signaling wakeup. | 
|  | 344 |  | 
|  | 345 | At this point the device is ready for power management.  For driverless devices, | 
|  | 346 | however, this functionality is limited to a few basic operations carried out | 
|  | 347 | during system-wide transitions to a sleep state and back to the working state. | 
|  | 348 |  | 
|  | 349 | 2.3. Runtime Device Power Management | 
|  | 350 | ------------------------------------ | 
|  | 351 | The PCI subsystem plays a vital role in the runtime power management of PCI | 
|  | 352 | devices.  For this purpose it uses the general runtime power management | 
|  | 353 | (runtime PM) framework described in Documentation/power/runtime_pm.txt. | 
|  | 354 | Namely, it provides subsystem-level callbacks: | 
|  | 355 |  | 
|  | 356 | pci_pm_runtime_suspend() | 
|  | 357 | pci_pm_runtime_resume() | 
|  | 358 | pci_pm_runtime_idle() | 
|  | 359 |  | 
|  | 360 | that are executed by the core runtime PM routines.  It also implements the | 
|  | 361 | entire mechanics necessary for handling runtime wakeup signals from PCI devices | 
|  | 362 | in low-power states, which at the time of this writing works for both the native | 
|  | 363 | PCI Express PME signaling and the ACPI GPE-based wakeup signaling described in | 
|  | 364 | Section 1. | 
|  | 365 |  | 
|  | 366 | First, a PCI device is put into a low-power state, or suspended, with the help | 
|  | 367 | of pm_schedule_suspend() or pm_runtime_suspend() which for PCI devices call | 
|  | 368 | pci_pm_runtime_suspend() to do the actual job.  For this to work, the device's | 
|  | 369 | driver has to provide a pm->runtime_suspend() callback (see below), which is | 
|  | 370 | run by pci_pm_runtime_suspend() as the first action.  If the driver's callback | 
|  | 371 | returns successfully, the device's standard configuration registers are saved, | 
|  | 372 | the device is prepared to generate wakeup signals and, finally, it is put into | 
|  | 373 | the target low-power state. | 
|  | 374 |  | 
|  | 375 | The low-power state to put the device into is the lowest-power (highest number) | 
|  | 376 | state from which it can signal wakeup.  The exact method of signaling wakeup is | 
|  | 377 | system-dependent and is determined by the PCI subsystem on the basis of the | 
|  | 378 | reported capabilities of the device and the platform firmware.  To prepare the | 
|  | 379 | device for signaling wakeup and put it into the selected low-power state, the | 
|  | 380 | PCI subsystem can use the platform firmware as well as the device's native PCI | 
|  | 381 | PM capabilities, if supported. | 
|  | 382 |  | 
|  | 383 | It is expected that the device driver's pm->runtime_suspend() callback will | 
|  | 384 | not attempt to prepare the device for signaling wakeup or to put it into a | 
|  | 385 | low-power state.  The driver ought to leave these tasks to the PCI subsystem | 
|  | 386 | that has all of the information necessary to perform them. | 
|  | 387 |  | 
|  | 388 | A suspended device is brought back into the "active" state, or resumed, | 
|  | 389 | with the help of pm_request_resume() or pm_runtime_resume() which both call | 
|  | 390 | pci_pm_runtime_resume() for PCI devices.  Again, this only works if the device's | 
|  | 391 | driver provides a pm->runtime_resume() callback (see below).  However, before | 
|  | 392 | the driver's callback is executed, pci_pm_runtime_resume() brings the device | 
|  | 393 | back into the full-power state, prevents it from signaling wakeup while in that | 
|  | 394 | state and restores its standard configuration registers.  Thus the driver's | 
|  | 395 | callback need not worry about the PCI-specific aspects of the device resume. | 
|  | 396 |  | 
|  | 397 | Note that generally pci_pm_runtime_resume() may be called in two different | 
|  | 398 | situations.  First, it may be called at the request of the device's driver, for | 
|  | 399 | example if there are some data for it to process.  Second, it may be called | 
|  | 400 | as a result of a wakeup signal from the device itself (this sometimes is | 
|  | 401 | referred to as "remote wakeup").  Of course, for this purpose the wakeup signal | 
|  | 402 | is handled in one of the ways described in Section 1 and finally converted into | 
|  | 403 | a notification for the PCI subsystem after the source device has been | 
|  | 404 | identified. | 
|  | 405 |  | 
|  | 406 | The pci_pm_runtime_idle() function, called for PCI devices by pm_runtime_idle() | 
|  | 407 | and pm_request_idle(), executes the device driver's pm->runtime_idle() | 
|  | 408 | callback, if defined, and if that callback doesn't return error code (or is not | 
|  | 409 | present at all), suspends the device with the help of pm_runtime_suspend(). | 
|  | 410 | Sometimes pci_pm_runtime_idle() is called automatically by the PM core (for | 
|  | 411 | example, it is called right after the device has just been resumed), in which | 
|  | 412 | cases it is expected to suspend the device if that makes sense.  Usually, | 
|  | 413 | however, the PCI subsystem doesn't really know if the device really can be | 
|  | 414 | suspended, so it lets the device's driver decide by running its | 
|  | 415 | pm->runtime_idle() callback. | 
|  | 416 |  | 
|  | 417 | 2.4. System-Wide Power Transitions | 
|  | 418 | ---------------------------------- | 
|  | 419 | There are a few different types of system-wide power transitions, described in | 
|  | 420 | Documentation/power/devices.txt.  Each of them requires devices to be handled | 
|  | 421 | in a specific way and the PM core executes subsystem-level power management | 
|  | 422 | callbacks for this purpose.  They are executed in phases such that each phase | 
|  | 423 | involves executing the same subsystem-level callback for every device belonging | 
|  | 424 | to the given subsystem before the next phase begins.  These phases always run | 
|  | 425 | after tasks have been frozen. | 
|  | 426 |  | 
|  | 427 | 2.4.1. System Suspend | 
|  | 428 |  | 
|  | 429 | When the system is going into a sleep state in which the contents of memory will | 
|  | 430 | be preserved, such as one of the ACPI sleep states S1-S3, the phases are: | 
|  | 431 |  | 
|  | 432 | prepare, suspend, suspend_noirq. | 
|  | 433 |  | 
|  | 434 | The following PCI bus type's callbacks, respectively, are used in these phases: | 
|  | 435 |  | 
|  | 436 | pci_pm_prepare() | 
|  | 437 | pci_pm_suspend() | 
|  | 438 | pci_pm_suspend_noirq() | 
|  | 439 |  | 
|  | 440 | The pci_pm_prepare() routine first puts the device into the "fully functional" | 
|  | 441 | state with the help of pm_runtime_resume().  Then, it executes the device | 
|  | 442 | driver's pm->prepare() callback if defined (i.e. if the driver's struct | 
|  | 443 | dev_pm_ops object is present and the prepare pointer in that object is valid). | 
|  | 444 |  | 
|  | 445 | The pci_pm_suspend() routine first checks if the device's driver implements | 
|  | 446 | legacy PCI suspend routines (see Section 3), in which case the driver's legacy | 
|  | 447 | suspend callback is executed, if present, and its result is returned.  Next, if | 
|  | 448 | the device's driver doesn't provide a struct dev_pm_ops object (containing | 
|  | 449 | pointers to the driver's callbacks), pci_pm_default_suspend() is called, which | 
|  | 450 | simply turns off the device's bus master capability and runs | 
|  | 451 | pcibios_disable_device() to disable it, unless the device is a bridge (PCI | 
|  | 452 | bridges are ignored by this routine).  Next, the device driver's pm->suspend() | 
|  | 453 | callback is executed, if defined, and its result is returned if it fails. | 
|  | 454 | Finally, pci_fixup_device() is called to apply hardware suspend quirks related | 
|  | 455 | to the device if necessary. | 
|  | 456 |  | 
|  | 457 | Note that the suspend phase is carried out asynchronously for PCI devices, so | 
|  | 458 | the pci_pm_suspend() callback may be executed in parallel for any pair of PCI | 
|  | 459 | devices that don't depend on each other in a known way (i.e. none of the paths | 
|  | 460 | in the device tree from the root bridge to a leaf device contains both of them). | 
|  | 461 |  | 
|  | 462 | The pci_pm_suspend_noirq() routine is executed after suspend_device_irqs() has | 
|  | 463 | been called, which means that the device driver's interrupt handler won't be | 
|  | 464 | invoked while this routine is running.  It first checks if the device's driver | 
|  | 465 | implements legacy PCI suspends routines (Section 3), in which case the legacy | 
|  | 466 | late suspend routine is called and its result is returned (the standard | 
|  | 467 | configuration registers of the device are saved if the driver's callback hasn't | 
|  | 468 | done that).  Second, if the device driver's struct dev_pm_ops object is not | 
|  | 469 | present, the device's standard configuration registers are saved and the routine | 
|  | 470 | returns success.  Otherwise the device driver's pm->suspend_noirq() callback is | 
|  | 471 | executed, if present, and its result is returned if it fails.  Next, if the | 
|  | 472 | device's standard configuration registers haven't been saved yet (one of the | 
|  | 473 | device driver's callbacks executed before might do that), pci_pm_suspend_noirq() | 
|  | 474 | saves them, prepares the device to signal wakeup (if necessary) and puts it into | 
|  | 475 | a low-power state. | 
|  | 476 |  | 
|  | 477 | The low-power state to put the device into is the lowest-power (highest number) | 
|  | 478 | state from which it can signal wakeup while the system is in the target sleep | 
|  | 479 | state.  Just like in the runtime PM case described above, the mechanism of | 
|  | 480 | signaling wakeup is system-dependent and determined by the PCI subsystem, which | 
|  | 481 | is also responsible for preparing the device to signal wakeup from the system's | 
|  | 482 | target sleep state as appropriate. | 
|  | 483 |  | 
|  | 484 | PCI device drivers (that don't implement legacy power management callbacks) are | 
|  | 485 | generally not expected to prepare devices for signaling wakeup or to put them | 
|  | 486 | into low-power states.  However, if one of the driver's suspend callbacks | 
|  | 487 | (pm->suspend() or pm->suspend_noirq()) saves the device's standard configuration | 
|  | 488 | registers, pci_pm_suspend_noirq() will assume that the device has been prepared | 
|  | 489 | to signal wakeup and put into a low-power state by the driver (the driver is | 
|  | 490 | then assumed to have used the helper functions provided by the PCI subsystem for | 
|  | 491 | this purpose).  PCI device drivers are not encouraged to do that, but in some | 
|  | 492 | rare cases doing that in the driver may be the optimum approach. | 
|  | 493 |  | 
|  | 494 | 2.4.2. System Resume | 
|  | 495 |  | 
|  | 496 | When the system is undergoing a transition from a sleep state in which the | 
|  | 497 | contents of memory have been preserved, such as one of the ACPI sleep states | 
|  | 498 | S1-S3, into the working state (ACPI S0), the phases are: | 
|  | 499 |  | 
|  | 500 | resume_noirq, resume, complete. | 
|  | 501 |  | 
|  | 502 | The following PCI bus type's callbacks, respectively, are executed in these | 
|  | 503 | phases: | 
|  | 504 |  | 
|  | 505 | pci_pm_resume_noirq() | 
|  | 506 | pci_pm_resume() | 
|  | 507 | pci_pm_complete() | 
|  | 508 |  | 
|  | 509 | The pci_pm_resume_noirq() routine first puts the device into the full-power | 
|  | 510 | state, restores its standard configuration registers and applies early resume | 
|  | 511 | hardware quirks related to the device, if necessary.  This is done | 
|  | 512 | unconditionally, regardless of whether or not the device's driver implements | 
|  | 513 | legacy PCI power management callbacks (this way all PCI devices are in the | 
|  | 514 | full-power state and their standard configuration registers have been restored | 
|  | 515 | when their interrupt handlers are invoked for the first time during resume, | 
|  | 516 | which allows the kernel to avoid problems with the handling of shared interrupts | 
|  | 517 | by drivers whose devices are still suspended).  If legacy PCI power management | 
|  | 518 | callbacks (see Section 3) are implemented by the device's driver, the legacy | 
|  | 519 | early resume callback is executed and its result is returned.  Otherwise, the | 
|  | 520 | device driver's pm->resume_noirq() callback is executed, if defined, and its | 
|  | 521 | result is returned. | 
|  | 522 |  | 
|  | 523 | The pci_pm_resume() routine first checks if the device's standard configuration | 
|  | 524 | registers have been restored and restores them if that's not the case (this | 
|  | 525 | only is necessary in the error path during a failing suspend).  Next, resume | 
|  | 526 | hardware quirks related to the device are applied, if necessary, and if the | 
|  | 527 | device's driver implements legacy PCI power management callbacks (see | 
|  | 528 | Section 3), the driver's legacy resume callback is executed and its result is | 
|  | 529 | returned.  Otherwise, the device's wakeup signaling mechanisms are blocked and | 
|  | 530 | its driver's pm->resume() callback is executed, if defined (the callback's | 
|  | 531 | result is then returned). | 
|  | 532 |  | 
|  | 533 | The resume phase is carried out asynchronously for PCI devices, like the | 
|  | 534 | suspend phase described above, which means that if two PCI devices don't depend | 
|  | 535 | on each other in a known way, the pci_pm_resume() routine may be executed for | 
|  | 536 | the both of them in parallel. | 
|  | 537 |  | 
|  | 538 | The pci_pm_complete() routine only executes the device driver's pm->complete() | 
|  | 539 | callback, if defined. | 
|  | 540 |  | 
|  | 541 | 2.4.3. System Hibernation | 
|  | 542 |  | 
|  | 543 | System hibernation is more complicated than system suspend, because it requires | 
|  | 544 | a system image to be created and written into a persistent storage medium.  The | 
|  | 545 | image is created atomically and all devices are quiesced, or frozen, before that | 
|  | 546 | happens. | 
|  | 547 |  | 
|  | 548 | The freezing of devices is carried out after enough memory has been freed (at | 
|  | 549 | the time of this writing the image creation requires at least 50% of system RAM | 
|  | 550 | to be free) in the following three phases: | 
|  | 551 |  | 
|  | 552 | prepare, freeze, freeze_noirq | 
|  | 553 |  | 
|  | 554 | that correspond to the PCI bus type's callbacks: | 
|  | 555 |  | 
|  | 556 | pci_pm_prepare() | 
|  | 557 | pci_pm_freeze() | 
|  | 558 | pci_pm_freeze_noirq() | 
|  | 559 |  | 
|  | 560 | This means that the prepare phase is exactly the same as for system suspend. | 
|  | 561 | The other two phases, however, are different. | 
|  | 562 |  | 
|  | 563 | The pci_pm_freeze() routine is quite similar to pci_pm_suspend(), but it runs | 
|  | 564 | the device driver's pm->freeze() callback, if defined, instead of pm->suspend(), | 
|  | 565 | and it doesn't apply the suspend-related hardware quirks.  It is executed | 
|  | 566 | asynchronously for different PCI devices that don't depend on each other in a | 
|  | 567 | known way. | 
|  | 568 |  | 
|  | 569 | The pci_pm_freeze_noirq() routine, in turn, is similar to | 
|  | 570 | pci_pm_suspend_noirq(), but it calls the device driver's pm->freeze_noirq() | 
|  | 571 | routine instead of pm->suspend_noirq().  It also doesn't attempt to prepare the | 
|  | 572 | device for signaling wakeup and put it into a low-power state.  Still, it saves | 
|  | 573 | the device's standard configuration registers if they haven't been saved by one | 
|  | 574 | of the driver's callbacks. | 
|  | 575 |  | 
|  | 576 | Once the image has been created, it has to be saved.  However, at this point all | 
|  | 577 | devices are frozen and they cannot handle I/O, while their ability to handle | 
|  | 578 | I/O is obviously necessary for the image saving.  Thus they have to be brought | 
|  | 579 | back to the fully functional state and this is done in the following phases: | 
|  | 580 |  | 
|  | 581 | thaw_noirq, thaw, complete | 
|  | 582 |  | 
|  | 583 | using the following PCI bus type's callbacks: | 
|  | 584 |  | 
|  | 585 | pci_pm_thaw_noirq() | 
|  | 586 | pci_pm_thaw() | 
|  | 587 | pci_pm_complete() | 
|  | 588 |  | 
|  | 589 | respectively. | 
|  | 590 |  | 
|  | 591 | The first of them, pci_pm_thaw_noirq(), is analogous to pci_pm_resume_noirq(), | 
|  | 592 | but it doesn't put the device into the full power state and doesn't attempt to | 
|  | 593 | restore its standard configuration registers.  It also executes the device | 
|  | 594 | driver's pm->thaw_noirq() callback, if defined, instead of pm->resume_noirq(). | 
|  | 595 |  | 
|  | 596 | The pci_pm_thaw() routine is similar to pci_pm_resume(), but it runs the device | 
|  | 597 | driver's pm->thaw() callback instead of pm->resume().  It is executed | 
|  | 598 | asynchronously for different PCI devices that don't depend on each other in a | 
|  | 599 | known way. | 
|  | 600 |  | 
|  | 601 | The complete phase it the same as for system resume. | 
|  | 602 |  | 
|  | 603 | After saving the image, devices need to be powered down before the system can | 
|  | 604 | enter the target sleep state (ACPI S4 for ACPI-based systems).  This is done in | 
|  | 605 | three phases: | 
|  | 606 |  | 
|  | 607 | prepare, poweroff, poweroff_noirq | 
|  | 608 |  | 
|  | 609 | where the prepare phase is exactly the same as for system suspend.  The other | 
|  | 610 | two phases are analogous to the suspend and suspend_noirq phases, respectively. | 
|  | 611 | The PCI subsystem-level callbacks they correspond to | 
|  | 612 |  | 
|  | 613 | pci_pm_poweroff() | 
|  | 614 | pci_pm_poweroff_noirq() | 
|  | 615 |  | 
|  | 616 | work in analogy with pci_pm_suspend() and pci_pm_poweroff_noirq(), respectively, | 
|  | 617 | although they don't attempt to save the device's standard configuration | 
|  | 618 | registers. | 
|  | 619 |  | 
|  | 620 | 2.4.4. System Restore | 
|  | 621 |  | 
|  | 622 | System restore requires a hibernation image to be loaded into memory and the | 
|  | 623 | pre-hibernation memory contents to be restored before the pre-hibernation system | 
|  | 624 | activity can be resumed. | 
|  | 625 |  | 
|  | 626 | As described in Documentation/power/devices.txt, the hibernation image is loaded | 
|  | 627 | into memory by a fresh instance of the kernel, called the boot kernel, which in | 
|  | 628 | turn is loaded and run by a boot loader in the usual way.  After the boot kernel | 
|  | 629 | has loaded the image, it needs to replace its own code and data with the code | 
|  | 630 | and data of the "hibernated" kernel stored within the image, called the image | 
|  | 631 | kernel.  For this purpose all devices are frozen just like before creating | 
|  | 632 | the image during hibernation, in the | 
|  | 633 |  | 
|  | 634 | prepare, freeze, freeze_noirq | 
|  | 635 |  | 
|  | 636 | phases described above.  However, the devices affected by these phases are only | 
|  | 637 | those having drivers in the boot kernel; other devices will still be in whatever | 
|  | 638 | state the boot loader left them. | 
|  | 639 |  | 
|  | 640 | Should the restoration of the pre-hibernation memory contents fail, the boot | 
|  | 641 | kernel would go through the "thawing" procedure described above, using the | 
|  | 642 | thaw_noirq, thaw, and complete phases (that will only affect the devices having | 
|  | 643 | drivers in the boot kernel), and then continue running normally. | 
|  | 644 |  | 
|  | 645 | If the pre-hibernation memory contents are restored successfully, which is the | 
|  | 646 | usual situation, control is passed to the image kernel, which then becomes | 
|  | 647 | responsible for bringing the system back to the working state.  To achieve this, | 
|  | 648 | it must restore the devices' pre-hibernation functionality, which is done much | 
|  | 649 | like waking up from the memory sleep state, although it involves different | 
|  | 650 | phases: | 
|  | 651 |  | 
|  | 652 | restore_noirq, restore, complete | 
|  | 653 |  | 
|  | 654 | The first two of these are analogous to the resume_noirq and resume phases | 
|  | 655 | described above, respectively, and correspond to the following PCI subsystem | 
|  | 656 | callbacks: | 
|  | 657 |  | 
|  | 658 | pci_pm_restore_noirq() | 
|  | 659 | pci_pm_restore() | 
|  | 660 |  | 
|  | 661 | These callbacks work in analogy with pci_pm_resume_noirq() and pci_pm_resume(), | 
|  | 662 | respectively, but they execute the device driver's pm->restore_noirq() and | 
|  | 663 | pm->restore() callbacks, if available. | 
|  | 664 |  | 
|  | 665 | The complete phase is carried out in exactly the same way as during system | 
|  | 666 | resume. | 
|  | 667 |  | 
|  | 668 |  | 
|  | 669 | 3. PCI Device Drivers and Power Management | 
|  | 670 | ========================================== | 
|  | 671 |  | 
|  | 672 | 3.1. Power Management Callbacks | 
|  | 673 | ------------------------------- | 
|  | 674 | PCI device drivers participate in power management by providing callbacks to be | 
|  | 675 | executed by the PCI subsystem's power management routines described above and by | 
|  | 676 | controlling the runtime power management of their devices. | 
|  | 677 |  | 
|  | 678 | At the time of this writing there are two ways to define power management | 
|  | 679 | callbacks for a PCI device driver, the recommended one, based on using a | 
|  | 680 | dev_pm_ops structure described in Documentation/power/devices.txt, and the | 
|  | 681 | "legacy" one, in which the .suspend(), .suspend_late(), .resume_early(), and | 
|  | 682 | .resume() callbacks from struct pci_driver are used.  The legacy approach, | 
|  | 683 | however, doesn't allow one to define runtime power management callbacks and is | 
|  | 684 | not really suitable for any new drivers.  Therefore it is not covered by this | 
|  | 685 | document (refer to the source code to learn more about it). | 
|  | 686 |  | 
|  | 687 | It is recommended that all PCI device drivers define a struct dev_pm_ops object | 
|  | 688 | containing pointers to power management (PM) callbacks that will be executed by | 
|  | 689 | the PCI subsystem's PM routines in various circumstances.  A pointer to the | 
|  | 690 | driver's struct dev_pm_ops object has to be assigned to the driver.pm field in | 
|  | 691 | its struct pci_driver object.  Once that has happened, the "legacy" PM callbacks | 
|  | 692 | in struct pci_driver are ignored (even if they are not NULL). | 
|  | 693 |  | 
|  | 694 | The PM callbacks in struct dev_pm_ops are not mandatory and if they are not | 
|  | 695 | defined (i.e. the respective fields of struct dev_pm_ops are unset) the PCI | 
|  | 696 | subsystem will handle the device in a simplified default manner.  If they are | 
|  | 697 | defined, though, they are expected to behave as described in the following | 
|  | 698 | subsections. | 
|  | 699 |  | 
|  | 700 | 3.1.1. prepare() | 
|  | 701 |  | 
|  | 702 | The prepare() callback is executed during system suspend, during hibernation | 
|  | 703 | (when a hibernation image is about to be created), during power-off after | 
|  | 704 | saving a hibernation image and during system restore, when a hibernation image | 
|  | 705 | has just been loaded into memory. | 
|  | 706 |  | 
|  | 707 | This callback is only necessary if the driver's device has children that in | 
|  | 708 | general may be registered at any time.  In that case the role of the prepare() | 
|  | 709 | callback is to prevent new children of the device from being registered until | 
|  | 710 | one of the resume_noirq(), thaw_noirq(), or restore_noirq() callbacks is run. | 
|  | 711 |  | 
|  | 712 | In addition to that the prepare() callback may carry out some operations | 
|  | 713 | preparing the device to be suspended, although it should not allocate memory | 
|  | 714 | (if additional memory is required to suspend the device, it has to be | 
|  | 715 | preallocated earlier, for example in a suspend/hibernate notifier as described | 
|  | 716 | in Documentation/power/notifiers.txt). | 
|  | 717 |  | 
|  | 718 | 3.1.2. suspend() | 
|  | 719 |  | 
|  | 720 | The suspend() callback is only executed during system suspend, after prepare() | 
|  | 721 | callbacks have been executed for all devices in the system. | 
|  | 722 |  | 
|  | 723 | This callback is expected to quiesce the device and prepare it to be put into a | 
|  | 724 | low-power state by the PCI subsystem.  It is not required (in fact it even is | 
|  | 725 | not recommended) that a PCI driver's suspend() callback save the standard | 
|  | 726 | configuration registers of the device, prepare it for waking up the system, or | 
|  | 727 | put it into a low-power state.  All of these operations can very well be taken | 
|  | 728 | care of by the PCI subsystem, without the driver's participation. | 
|  | 729 |  | 
|  | 730 | However, in some rare case it is convenient to carry out these operations in | 
|  | 731 | a PCI driver.  Then, pci_save_state(), pci_prepare_to_sleep(), and | 
|  | 732 | pci_set_power_state() should be used to save the device's standard configuration | 
|  | 733 | registers, to prepare it for system wakeup (if necessary), and to put it into a | 
|  | 734 | low-power state, respectively.  Moreover, if the driver calls pci_save_state(), | 
|  | 735 | the PCI subsystem will not execute either pci_prepare_to_sleep(), or | 
|  | 736 | pci_set_power_state() for its device, so the driver is then responsible for | 
|  | 737 | handling the device as appropriate. | 
|  | 738 |  | 
|  | 739 | While the suspend() callback is being executed, the driver's interrupt handler | 
|  | 740 | can be invoked to handle an interrupt from the device, so all suspend-related | 
|  | 741 | operations relying on the driver's ability to handle interrupts should be | 
|  | 742 | carried out in this callback. | 
|  | 743 |  | 
|  | 744 | 3.1.3. suspend_noirq() | 
|  | 745 |  | 
|  | 746 | The suspend_noirq() callback is only executed during system suspend, after | 
|  | 747 | suspend() callbacks have been executed for all devices in the system and | 
|  | 748 | after device interrupts have been disabled by the PM core. | 
|  | 749 |  | 
|  | 750 | The difference between suspend_noirq() and suspend() is that the driver's | 
|  | 751 | interrupt handler will not be invoked while suspend_noirq() is running.  Thus | 
|  | 752 | suspend_noirq() can carry out operations that would cause race conditions to | 
|  | 753 | arise if they were performed in suspend(). | 
|  | 754 |  | 
|  | 755 | 3.1.4. freeze() | 
|  | 756 |  | 
|  | 757 | The freeze() callback is hibernation-specific and is executed in two situations, | 
|  | 758 | during hibernation, after prepare() callbacks have been executed for all devices | 
|  | 759 | in preparation for the creation of a system image, and during restore, | 
|  | 760 | after a system image has been loaded into memory from persistent storage and the | 
|  | 761 | prepare() callbacks have been executed for all devices. | 
|  | 762 |  | 
|  | 763 | The role of this callback is analogous to the role of the suspend() callback | 
|  | 764 | described above.  In fact, they only need to be different in the rare cases when | 
|  | 765 | the driver takes the responsibility for putting the device into a low-power | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 766 | state. | 
|  | 767 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 768 | In that cases the freeze() callback should not prepare the device system wakeup | 
|  | 769 | or put it into a low-power state.  Still, either it or freeze_noirq() should | 
|  | 770 | save the device's standard configuration registers using pci_save_state(). | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 771 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 772 | 3.1.5. freeze_noirq() | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 773 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 774 | The freeze_noirq() callback is hibernation-specific.  It is executed during | 
|  | 775 | hibernation, after prepare() and freeze() callbacks have been executed for all | 
|  | 776 | devices in preparation for the creation of a system image, and during restore, | 
|  | 777 | after a system image has been loaded into memory and after prepare() and | 
|  | 778 | freeze() callbacks have been executed for all devices.  It is always executed | 
|  | 779 | after device interrupts have been disabled by the PM core. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 780 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 781 | The role of this callback is analogous to the role of the suspend_noirq() | 
|  | 782 | callback described above and it very rarely is necessary to define | 
|  | 783 | freeze_noirq(). | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 784 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 785 | The difference between freeze_noirq() and freeze() is analogous to the | 
|  | 786 | difference between suspend_noirq() and suspend(). | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 787 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 788 | 3.1.6. poweroff() | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 789 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 790 | The poweroff() callback is hibernation-specific.  It is executed when the system | 
|  | 791 | is about to be powered off after saving a hibernation image to a persistent | 
|  | 792 | storage.  prepare() callbacks are executed for all devices before poweroff() is | 
|  | 793 | called. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 794 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 795 | The role of this callback is analogous to the role of the suspend() and freeze() | 
|  | 796 | callbacks described above, although it does not need to save the contents of | 
|  | 797 | the device's registers.  In particular, if the driver wants to put the device | 
|  | 798 | into a low-power state itself instead of allowing the PCI subsystem to do that, | 
|  | 799 | the poweroff() callback should use pci_prepare_to_sleep() and | 
|  | 800 | pci_set_power_state() to prepare the device for system wakeup and to put it | 
|  | 801 | into a low-power state, respectively, but it need not save the device's standard | 
|  | 802 | configuration registers. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 803 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 804 | 3.1.7. poweroff_noirq() | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 805 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 806 | The poweroff_noirq() callback is hibernation-specific.  It is executed after | 
|  | 807 | poweroff() callbacks have been executed for all devices in the system. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 808 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 809 | The role of this callback is analogous to the role of the suspend_noirq() and | 
|  | 810 | freeze_noirq() callbacks described above, but it does not need to save the | 
|  | 811 | contents of the device's registers. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 812 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 813 | The difference between poweroff_noirq() and poweroff() is analogous to the | 
|  | 814 | difference between suspend_noirq() and suspend(). | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 815 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 816 | 3.1.8. resume_noirq() | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 817 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 818 | The resume_noirq() callback is only executed during system resume, after the | 
|  | 819 | PM core has enabled the non-boot CPUs.  The driver's interrupt handler will not | 
|  | 820 | be invoked while resume_noirq() is running, so this callback can carry out | 
|  | 821 | operations that might race with the interrupt handler. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 822 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 823 | Since the PCI subsystem unconditionally puts all devices into the full power | 
|  | 824 | state in the resume_noirq phase of system resume and restores their standard | 
|  | 825 | configuration registers, resume_noirq() is usually not necessary.  In general | 
|  | 826 | it should only be used for performing operations that would lead to race | 
|  | 827 | conditions if carried out by resume(). | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 828 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 829 | 3.1.9. resume() | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 830 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 831 | The resume() callback is only executed during system resume, after | 
|  | 832 | resume_noirq() callbacks have been executed for all devices in the system and | 
|  | 833 | device interrupts have been enabled by the PM core. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 834 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 835 | This callback is responsible for restoring the pre-suspend configuration of the | 
|  | 836 | device and bringing it back to the fully functional state.  The device should be | 
|  | 837 | able to process I/O in a usual way after resume() has returned. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 838 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 839 | 3.1.10. thaw_noirq() | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 840 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 841 | The thaw_noirq() callback is hibernation-specific.  It is executed after a | 
|  | 842 | system image has been created and the non-boot CPUs have been enabled by the PM | 
|  | 843 | core, in the thaw_noirq phase of hibernation.  It also may be executed if the | 
|  | 844 | loading of a hibernation image fails during system restore (it is then executed | 
|  | 845 | after enabling the non-boot CPUs).  The driver's interrupt handler will not be | 
|  | 846 | invoked while thaw_noirq() is running. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 847 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 848 | The role of this callback is analogous to the role of resume_noirq().  The | 
|  | 849 | difference between these two callbacks is that thaw_noirq() is executed after | 
|  | 850 | freeze() and freeze_noirq(), so in general it does not need to modify the | 
|  | 851 | contents of the device's registers. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 852 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 853 | 3.1.11. thaw() | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 854 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 855 | The thaw() callback is hibernation-specific.  It is executed after thaw_noirq() | 
|  | 856 | callbacks have been executed for all devices in the system and after device | 
|  | 857 | interrupts have been enabled by the PM core. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 858 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 859 | This callback is responsible for restoring the pre-freeze configuration of | 
|  | 860 | the device, so that it will work in a usual way after thaw() has returned. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 861 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 862 | 3.1.12. restore_noirq() | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 863 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 864 | The restore_noirq() callback is hibernation-specific.  It is executed in the | 
|  | 865 | restore_noirq phase of hibernation, when the boot kernel has passed control to | 
|  | 866 | the image kernel and the non-boot CPUs have been enabled by the image kernel's | 
|  | 867 | PM core. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 868 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 869 | This callback is analogous to resume_noirq() with the exception that it cannot | 
|  | 870 | make any assumption on the previous state of the device, even if the BIOS (or | 
|  | 871 | generally the platform firmware) is known to preserve that state over a | 
|  | 872 | suspend-resume cycle. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 873 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 874 | For the vast majority of PCI device drivers there is no difference between | 
|  | 875 | resume_noirq() and restore_noirq(). | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 876 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 877 | 3.1.13. restore() | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 878 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 879 | The restore() callback is hibernation-specific.  It is executed after | 
|  | 880 | restore_noirq() callbacks have been executed for all devices in the system and | 
|  | 881 | after the PM core has enabled device drivers' interrupt handlers to be invoked. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 882 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 883 | This callback is analogous to resume(), just like restore_noirq() is analogous | 
|  | 884 | to resume_noirq().  Consequently, the difference between restore_noirq() and | 
|  | 885 | restore() is analogous to the difference between resume_noirq() and resume(). | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 886 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 887 | For the vast majority of PCI device drivers there is no difference between | 
|  | 888 | resume() and restore(). | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 889 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 890 | 3.1.14. complete() | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 891 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 892 | The complete() callback is executed in the following situations: | 
|  | 893 | - during system resume, after resume() callbacks have been executed for all | 
|  | 894 | devices, | 
|  | 895 | - during hibernation, before saving the system image, after thaw() callbacks | 
|  | 896 | have been executed for all devices, | 
|  | 897 | - during system restore, when the system is going back to its pre-hibernation | 
|  | 898 | state, after restore() callbacks have been executed for all devices. | 
|  | 899 | It also may be executed if the loading of a hibernation image into memory fails | 
|  | 900 | (in that case it is run after thaw() callbacks have been executed for all | 
|  | 901 | devices that have drivers in the boot kernel). | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 902 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 903 | This callback is entirely optional, although it may be necessary if the | 
|  | 904 | prepare() callback performs operations that need to be reversed. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 905 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 906 | 3.1.15. runtime_suspend() | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 907 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 908 | The runtime_suspend() callback is specific to device runtime power management | 
|  | 909 | (runtime PM).  It is executed by the PM core's runtime PM framework when the | 
|  | 910 | device is about to be suspended (i.e. quiesced and put into a low-power state) | 
|  | 911 | at run time. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 912 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 913 | This callback is responsible for freezing the device and preparing it to be | 
|  | 914 | put into a low-power state, but it must allow the PCI subsystem to perform all | 
|  | 915 | of the PCI-specific actions necessary for suspending the device. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 916 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 917 | 3.1.16. runtime_resume() | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 918 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 919 | The runtime_resume() callback is specific to device runtime PM.  It is executed | 
|  | 920 | by the PM core's runtime PM framework when the device is about to be resumed | 
|  | 921 | (i.e. put into the full-power state and programmed to process I/O normally) at | 
|  | 922 | run time. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 923 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 924 | This callback is responsible for restoring the normal functionality of the | 
|  | 925 | device after it has been put into the full-power state by the PCI subsystem. | 
|  | 926 | The device is expected to be able to process I/O in the usual way after | 
|  | 927 | runtime_resume() has returned. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 928 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 929 | 3.1.17. runtime_idle() | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 930 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 931 | The runtime_idle() callback is specific to device runtime PM.  It is executed | 
|  | 932 | by the PM core's runtime PM framework whenever it may be desirable to suspend | 
|  | 933 | the device according to the PM core's information.  In particular, it is | 
|  | 934 | automatically executed right after runtime_resume() has returned in case the | 
|  | 935 | resume of the device has happened as a result of a spurious event. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 936 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 937 | This callback is optional, but if it is not implemented or if it returns 0, the | 
|  | 938 | PCI subsystem will call pm_runtime_suspend() for the device, which in turn will | 
|  | 939 | cause the driver's runtime_suspend() callback to be executed. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 940 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 941 | 3.1.18. Pointing Multiple Callback Pointers to One Routine | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 942 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 943 | Although in principle each of the callbacks described in the previous | 
|  | 944 | subsections can be defined as a separate function, it often is convenient to | 
|  | 945 | point two or more members of struct dev_pm_ops to the same routine.  There are | 
|  | 946 | a few convenience macros that can be used for this purpose. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 947 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 948 | The SIMPLE_DEV_PM_OPS macro declares a struct dev_pm_ops object with one | 
|  | 949 | suspend routine pointed to by the .suspend(), .freeze(), and .poweroff() | 
|  | 950 | members and one resume routine pointed to by the .resume(), .thaw(), and | 
|  | 951 | .restore() members.  The other function pointers in this struct dev_pm_ops are | 
|  | 952 | unset. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 953 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 954 | The UNIVERSAL_DEV_PM_OPS macro is similar to SIMPLE_DEV_PM_OPS, but it | 
|  | 955 | additionally sets the .runtime_resume() pointer to the same value as | 
|  | 956 | .resume() (and .thaw(), and .restore()) and the .runtime_suspend() pointer to | 
|  | 957 | the same value as .suspend() (and .freeze() and .poweroff()). | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 958 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 959 | The SET_SYSTEM_SLEEP_PM_OPS can be used inside of a declaration of struct | 
|  | 960 | dev_pm_ops to indicate that one suspend routine is to be pointed to by the | 
|  | 961 | .suspend(), .freeze(), and .poweroff() members and one resume routine is to | 
|  | 962 | be pointed to by the .resume(), .thaw(), and .restore() members. | 
| pavel@ucw.cz | 21d6b7e | 2005-06-25 14:55:16 -0700 | [diff] [blame] | 963 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 964 | 3.2. Device Runtime Power Management | 
|  | 965 | ------------------------------------ | 
|  | 966 | In addition to providing device power management callbacks PCI device drivers | 
|  | 967 | are responsible for controlling the runtime power management (runtime PM) of | 
|  | 968 | their devices. | 
| pavel@ucw.cz | 21d6b7e | 2005-06-25 14:55:16 -0700 | [diff] [blame] | 969 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 970 | The PCI device runtime PM is optional, but it is recommended that PCI device | 
|  | 971 | drivers implement it at least in the cases where there is a reliable way of | 
|  | 972 | verifying that the device is not used (like when the network cable is detached | 
|  | 973 | from an Ethernet adapter or there are no devices attached to a USB controller). | 
| pavel@ucw.cz | 21d6b7e | 2005-06-25 14:55:16 -0700 | [diff] [blame] | 974 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 975 | To support the PCI runtime PM the driver first needs to implement the | 
|  | 976 | runtime_suspend() and runtime_resume() callbacks.  It also may need to implement | 
|  | 977 | the runtime_idle() callback to prevent the device from being suspended again | 
|  | 978 | every time right after the runtime_resume() callback has returned | 
|  | 979 | (alternatively, the runtime_suspend() callback will have to check if the | 
|  | 980 | device should really be suspended and return -EAGAIN if that is not the case). | 
| pavel@ucw.cz | 21d6b7e | 2005-06-25 14:55:16 -0700 | [diff] [blame] | 981 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 982 | The runtime PM of PCI devices is disabled by default.  It is also blocked by | 
|  | 983 | pci_pm_init() that runs the pm_runtime_forbid() helper function.  If a PCI | 
|  | 984 | driver implements the runtime PM callbacks and intends to use the runtime PM | 
|  | 985 | framework provided by the PM core and the PCI subsystem, it should enable this | 
|  | 986 | feature by executing the pm_runtime_enable() helper function.  However, the | 
|  | 987 | driver should not call the pm_runtime_allow() helper function unblocking | 
|  | 988 | the runtime PM of the device.  Instead, it should allow user space or some | 
|  | 989 | platform-specific code to do that (user space can do it via sysfs), although | 
|  | 990 | once it has called pm_runtime_enable(), it must be prepared to handle the | 
|  | 991 | runtime PM of the device correctly as soon as pm_runtime_allow() is called | 
|  | 992 | (which may happen at any time).  [It also is possible that user space causes | 
|  | 993 | pm_runtime_allow() to be called via sysfs before the driver is loaded, so in | 
|  | 994 | fact the driver has to be prepared to handle the runtime PM of the device as | 
|  | 995 | soon as it calls pm_runtime_enable().] | 
| pavel@ucw.cz | 21d6b7e | 2005-06-25 14:55:16 -0700 | [diff] [blame] | 996 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 997 | The runtime PM framework works by processing requests to suspend or resume | 
|  | 998 | devices, or to check if they are idle (in which cases it is reasonable to | 
|  | 999 | subsequently request that they be suspended).  These requests are represented | 
|  | 1000 | by work items put into the power management workqueue, pm_wq.  Although there | 
|  | 1001 | are a few situations in which power management requests are automatically | 
|  | 1002 | queued by the PM core (for example, after processing a request to resume a | 
|  | 1003 | device the PM core automatically queues a request to check if the device is | 
|  | 1004 | idle), device drivers are generally responsible for queuing power management | 
|  | 1005 | requests for their devices.  For this purpose they should use the runtime PM | 
|  | 1006 | helper functions provided by the PM core, discussed in | 
|  | 1007 | Documentation/power/runtime_pm.txt. | 
| pavel@ucw.cz | 21d6b7e | 2005-06-25 14:55:16 -0700 | [diff] [blame] | 1008 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 1009 | Devices can also be suspended and resumed synchronously, without placing a | 
|  | 1010 | request into pm_wq.  In the majority of cases this also is done by their | 
|  | 1011 | drivers that use helper functions provided by the PM core for this purpose. | 
| pavel@ucw.cz | 21d6b7e | 2005-06-25 14:55:16 -0700 | [diff] [blame] | 1012 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 1013 | For more information on the runtime PM of devices refer to | 
|  | 1014 | Documentation/power/runtime_pm.txt. | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1015 |  | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1016 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 1017 | 4. Resources | 
|  | 1018 | ============ | 
| Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1019 |  | 
| Rafael J. Wysocki | b799957 | 2010-05-18 00:23:24 +0200 | [diff] [blame] | 1020 | PCI Local Bus Specification, Rev. 3.0 | 
|  | 1021 | PCI Bus Power Management Interface Specification, Rev. 1.2 | 
|  | 1022 | Advanced Configuration and Power Interface (ACPI) Specification, Rev. 3.0b | 
|  | 1023 | PCI Express Base Specification, Rev. 2.0 | 
|  | 1024 | Documentation/power/devices.txt | 
|  | 1025 | Documentation/power/runtime_pm.txt |