|  | Run-time Power Management Framework for I/O Devices | 
|  |  | 
|  | (C) 2009-2011 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc. | 
|  | (C) 2010 Alan Stern <stern@rowland.harvard.edu> | 
|  |  | 
|  | 1. Introduction | 
|  |  | 
|  | Support for run-time power management (run-time PM) of I/O devices is provided | 
|  | at the power management core (PM core) level by means of: | 
|  |  | 
|  | * The power management workqueue pm_wq in which bus types and device drivers can | 
|  | put their PM-related work items.  It is strongly recommended that pm_wq be | 
|  | used for queuing all work items related to run-time PM, because this allows | 
|  | them to be synchronized with system-wide power transitions (suspend to RAM, | 
|  | hibernation and resume from system sleep states).  pm_wq is declared in | 
|  | include/linux/pm_runtime.h and defined in kernel/power/main.c. | 
|  |  | 
|  | * A number of run-time PM fields in the 'power' member of 'struct device' (which | 
|  | is of the type 'struct dev_pm_info', defined in include/linux/pm.h) that can | 
|  | be used for synchronizing run-time PM operations with one another. | 
|  |  | 
|  | * Three device run-time PM callbacks in 'struct dev_pm_ops' (defined in | 
|  | include/linux/pm.h). | 
|  |  | 
|  | * A set of helper functions defined in drivers/base/power/runtime.c that can be | 
|  | used for carrying out run-time PM operations in such a way that the | 
|  | synchronization between them is taken care of by the PM core.  Bus types and | 
|  | device drivers are encouraged to use these functions. | 
|  |  | 
|  | The run-time PM callbacks present in 'struct dev_pm_ops', the device run-time PM | 
|  | fields of 'struct dev_pm_info' and the core helper functions provided for | 
|  | run-time PM are described below. | 
|  |  | 
|  | 2. Device Run-time PM Callbacks | 
|  |  | 
|  | There are three device run-time PM callbacks defined in 'struct dev_pm_ops': | 
|  |  | 
|  | struct dev_pm_ops { | 
|  | ... | 
|  | int (*runtime_suspend)(struct device *dev); | 
|  | int (*runtime_resume)(struct device *dev); | 
|  | int (*runtime_idle)(struct device *dev); | 
|  | ... | 
|  | }; | 
|  |  | 
|  | The ->runtime_suspend(), ->runtime_resume() and ->runtime_idle() callbacks are | 
|  | executed by the PM core for either the device type, or the class (if the device | 
|  | type's struct dev_pm_ops object does not exist), or the bus type (if the | 
|  | device type's and class' struct dev_pm_ops objects do not exist) of the given | 
|  | device (this allows device types to override callbacks provided by bus types or | 
|  | classes if necessary).  The bus type, device type and class callbacks are | 
|  | referred to as subsystem-level callbacks in what follows. | 
|  |  | 
|  | By default, the callbacks are always invoked in process context with interrupts | 
|  | enabled.  However, subsystems can use the pm_runtime_irq_safe() helper function | 
|  | to tell the PM core that a device's ->runtime_suspend() and ->runtime_resume() | 
|  | callbacks should be invoked in atomic context with interrupts disabled | 
|  | (->runtime_idle() is still invoked the default way).  This implies that these | 
|  | callback routines must not block or sleep, but it also means that the | 
|  | synchronous helper functions listed at the end of Section 4 can be used within | 
|  | an interrupt handler or in an atomic context. | 
|  |  | 
|  | The subsystem-level suspend callback is _entirely_ _responsible_ for handling | 
|  | the suspend of the device as appropriate, which may, but need not include | 
|  | executing the device driver's own ->runtime_suspend() callback (from the | 
|  | PM core's point of view it is not necessary to implement a ->runtime_suspend() | 
|  | callback in a device driver as long as the subsystem-level suspend callback | 
|  | knows what to do to handle the device). | 
|  |  | 
|  | * Once the subsystem-level suspend callback has completed successfully | 
|  | for given device, the PM core regards the device as suspended, which need | 
|  | not mean that the device has been put into a low power state.  It is | 
|  | supposed to mean, however, that the device will not process data and will | 
|  | not communicate with the CPU(s) and RAM until the subsystem-level resume | 
|  | callback is executed for it.  The run-time PM status of a device after | 
|  | successful execution of the subsystem-level suspend callback is 'suspended'. | 
|  |  | 
|  | * If the subsystem-level suspend callback returns -EBUSY or -EAGAIN, | 
|  | the device's run-time PM status is 'active', which means that the device | 
|  | _must_ be fully operational afterwards. | 
|  |  | 
|  | * If the subsystem-level suspend callback returns an error code different | 
|  | from -EBUSY or -EAGAIN, the PM core regards this as a fatal error and will | 
|  | refuse to run the helper functions described in Section 4 for the device, | 
|  | until the status of it is directly set either to 'active', or to 'suspended' | 
|  | (the PM core provides special helper functions for this purpose). | 
|  |  | 
|  | In particular, if the driver requires remote wake-up capability (i.e. hardware | 
|  | mechanism allowing the device to request a change of its power state, such as | 
|  | PCI PME) for proper functioning and device_run_wake() returns 'false' for the | 
|  | device, then ->runtime_suspend() should return -EBUSY.  On the other hand, if | 
|  | device_run_wake() returns 'true' for the device and the device is put into a low | 
|  | power state during the execution of the subsystem-level suspend callback, it is | 
|  | expected that remote wake-up will be enabled for the device.  Generally, remote | 
|  | wake-up should be enabled for all input devices put into a low power state at | 
|  | run time. | 
|  |  | 
|  | The subsystem-level resume callback is _entirely_ _responsible_ for handling the | 
|  | resume of the device as appropriate, which may, but need not include executing | 
|  | the device driver's own ->runtime_resume() callback (from the PM core's point of | 
|  | view it is not necessary to implement a ->runtime_resume() callback in a device | 
|  | driver as long as the subsystem-level resume callback knows what to do to handle | 
|  | the device). | 
|  |  | 
|  | * Once the subsystem-level resume callback has completed successfully, the PM | 
|  | core regards the device as fully operational, which means that the device | 
|  | _must_ be able to complete I/O operations as needed.  The run-time PM status | 
|  | of the device is then 'active'. | 
|  |  | 
|  | * If the subsystem-level resume callback returns an error code, the PM core | 
|  | regards this as a fatal error and will refuse to run the helper functions | 
|  | described in Section 4 for the device, until its status is directly set | 
|  | either to 'active' or to 'suspended' (the PM core provides special helper | 
|  | functions for this purpose). | 
|  |  | 
|  | The subsystem-level idle callback is executed by the PM core whenever the device | 
|  | appears to be idle, which is indicated to the PM core by two counters, the | 
|  | device's usage counter and the counter of 'active' children of the device. | 
|  |  | 
|  | * If any of these counters is decreased using a helper function provided by | 
|  | the PM core and it turns out to be equal to zero, the other counter is | 
|  | checked.  If that counter also is equal to zero, the PM core executes the | 
|  | subsystem-level idle callback with the device as an argument. | 
|  |  | 
|  | The action performed by a subsystem-level idle callback is totally dependent on | 
|  | the subsystem in question, but the expected and recommended action is to check | 
|  | if the device can be suspended (i.e. if all of the conditions necessary for | 
|  | suspending the device are satisfied) and to queue up a suspend request for the | 
|  | device in that case.  The value returned by this callback is ignored by the PM | 
|  | core. | 
|  |  | 
|  | The helper functions provided by the PM core, described in Section 4, guarantee | 
|  | that the following constraints are met with respect to the bus type's run-time | 
|  | PM callbacks: | 
|  |  | 
|  | (1) The callbacks are mutually exclusive (e.g. it is forbidden to execute | 
|  | ->runtime_suspend() in parallel with ->runtime_resume() or with another | 
|  | instance of ->runtime_suspend() for the same device) with the exception that | 
|  | ->runtime_suspend() or ->runtime_resume() can be executed in parallel with | 
|  | ->runtime_idle() (although ->runtime_idle() will not be started while any | 
|  | of the other callbacks is being executed for the same device). | 
|  |  | 
|  | (2) ->runtime_idle() and ->runtime_suspend() can only be executed for 'active' | 
|  | devices (i.e. the PM core will only execute ->runtime_idle() or | 
|  | ->runtime_suspend() for the devices the run-time PM status of which is | 
|  | 'active'). | 
|  |  | 
|  | (3) ->runtime_idle() and ->runtime_suspend() can only be executed for a device | 
|  | the usage counter of which is equal to zero _and_ either the counter of | 
|  | 'active' children of which is equal to zero, or the 'power.ignore_children' | 
|  | flag of which is set. | 
|  |  | 
|  | (4) ->runtime_resume() can only be executed for 'suspended' devices  (i.e. the | 
|  | PM core will only execute ->runtime_resume() for the devices the run-time | 
|  | PM status of which is 'suspended'). | 
|  |  | 
|  | Additionally, the helper functions provided by the PM core obey the following | 
|  | rules: | 
|  |  | 
|  | * If ->runtime_suspend() is about to be executed or there's a pending request | 
|  | to execute it, ->runtime_idle() will not be executed for the same device. | 
|  |  | 
|  | * A request to execute or to schedule the execution of ->runtime_suspend() | 
|  | will cancel any pending requests to execute ->runtime_idle() for the same | 
|  | device. | 
|  |  | 
|  | * If ->runtime_resume() is about to be executed or there's a pending request | 
|  | to execute it, the other callbacks will not be executed for the same device. | 
|  |  | 
|  | * A request to execute ->runtime_resume() will cancel any pending or | 
|  | scheduled requests to execute the other callbacks for the same device, | 
|  | except for scheduled autosuspends. | 
|  |  | 
|  | 3. Run-time PM Device Fields | 
|  |  | 
|  | The following device run-time PM fields are present in 'struct dev_pm_info', as | 
|  | defined in include/linux/pm.h: | 
|  |  | 
|  | struct timer_list suspend_timer; | 
|  | - timer used for scheduling (delayed) suspend and autosuspend requests | 
|  |  | 
|  | unsigned long timer_expires; | 
|  | - timer expiration time, in jiffies (if this is different from zero, the | 
|  | timer is running and will expire at that time, otherwise the timer is not | 
|  | running) | 
|  |  | 
|  | struct work_struct work; | 
|  | - work structure used for queuing up requests (i.e. work items in pm_wq) | 
|  |  | 
|  | wait_queue_head_t wait_queue; | 
|  | - wait queue used if any of the helper functions needs to wait for another | 
|  | one to complete | 
|  |  | 
|  | spinlock_t lock; | 
|  | - lock used for synchronisation | 
|  |  | 
|  | atomic_t usage_count; | 
|  | - the usage counter of the device | 
|  |  | 
|  | atomic_t child_count; | 
|  | - the count of 'active' children of the device | 
|  |  | 
|  | unsigned int ignore_children; | 
|  | - if set, the value of child_count is ignored (but still updated) | 
|  |  | 
|  | unsigned int disable_depth; | 
|  | - used for disabling the helper funcions (they work normally if this is | 
|  | equal to zero); the initial value of it is 1 (i.e. run-time PM is | 
|  | initially disabled for all devices) | 
|  |  | 
|  | unsigned int runtime_error; | 
|  | - if set, there was a fatal error (one of the callbacks returned error code | 
|  | as described in Section 2), so the helper funtions will not work until | 
|  | this flag is cleared; this is the error code returned by the failing | 
|  | callback | 
|  |  | 
|  | unsigned int idle_notification; | 
|  | - if set, ->runtime_idle() is being executed | 
|  |  | 
|  | unsigned int request_pending; | 
|  | - if set, there's a pending request (i.e. a work item queued up into pm_wq) | 
|  |  | 
|  | enum rpm_request request; | 
|  | - type of request that's pending (valid if request_pending is set) | 
|  |  | 
|  | unsigned int deferred_resume; | 
|  | - set if ->runtime_resume() is about to be run while ->runtime_suspend() is | 
|  | being executed for that device and it is not practical to wait for the | 
|  | suspend to complete; means "start a resume as soon as you've suspended" | 
|  |  | 
|  | unsigned int run_wake; | 
|  | - set if the device is capable of generating run-time wake-up events | 
|  |  | 
|  | enum rpm_status runtime_status; | 
|  | - the run-time PM status of the device; this field's initial value is | 
|  | RPM_SUSPENDED, which means that each device is initially regarded by the | 
|  | PM core as 'suspended', regardless of its real hardware status | 
|  |  | 
|  | unsigned int runtime_auto; | 
|  | - if set, indicates that the user space has allowed the device driver to | 
|  | power manage the device at run time via the /sys/devices/.../power/control | 
|  | interface; it may only be modified with the help of the pm_runtime_allow() | 
|  | and pm_runtime_forbid() helper functions | 
|  |  | 
|  | unsigned int no_callbacks; | 
|  | - indicates that the device does not use the run-time PM callbacks (see | 
|  | Section 8); it may be modified only by the pm_runtime_no_callbacks() | 
|  | helper function | 
|  |  | 
|  | unsigned int irq_safe; | 
|  | - indicates that the ->runtime_suspend() and ->runtime_resume() callbacks | 
|  | will be invoked with the spinlock held and interrupts disabled | 
|  |  | 
|  | unsigned int use_autosuspend; | 
|  | - indicates that the device's driver supports delayed autosuspend (see | 
|  | Section 9); it may be modified only by the | 
|  | pm_runtime{_dont}_use_autosuspend() helper functions | 
|  |  | 
|  | unsigned int timer_autosuspends; | 
|  | - indicates that the PM core should attempt to carry out an autosuspend | 
|  | when the timer expires rather than a normal suspend | 
|  |  | 
|  | int autosuspend_delay; | 
|  | - the delay time (in milliseconds) to be used for autosuspend | 
|  |  | 
|  | unsigned long last_busy; | 
|  | - the time (in jiffies) when the pm_runtime_mark_last_busy() helper | 
|  | function was last called for this device; used in calculating inactivity | 
|  | periods for autosuspend | 
|  |  | 
|  | All of the above fields are members of the 'power' member of 'struct device'. | 
|  |  | 
|  | 4. Run-time PM Device Helper Functions | 
|  |  | 
|  | The following run-time PM helper functions are defined in | 
|  | drivers/base/power/runtime.c and include/linux/pm_runtime.h: | 
|  |  | 
|  | void pm_runtime_init(struct device *dev); | 
|  | - initialize the device run-time PM fields in 'struct dev_pm_info' | 
|  |  | 
|  | void pm_runtime_remove(struct device *dev); | 
|  | - make sure that the run-time PM of the device will be disabled after | 
|  | removing the device from device hierarchy | 
|  |  | 
|  | int pm_runtime_idle(struct device *dev); | 
|  | - execute the subsystem-level idle callback for the device; returns 0 on | 
|  | success or error code on failure, where -EINPROGRESS means that | 
|  | ->runtime_idle() is already being executed | 
|  |  | 
|  | int pm_runtime_suspend(struct device *dev); | 
|  | - execute the subsystem-level suspend callback for the device; returns 0 on | 
|  | success, 1 if the device's run-time PM status was already 'suspended', or | 
|  | error code on failure, where -EAGAIN or -EBUSY means it is safe to attempt | 
|  | to suspend the device again in future | 
|  |  | 
|  | int pm_runtime_autosuspend(struct device *dev); | 
|  | - same as pm_runtime_suspend() except that the autosuspend delay is taken | 
|  | into account; if pm_runtime_autosuspend_expiration() says the delay has | 
|  | not yet expired then an autosuspend is scheduled for the appropriate time | 
|  | and 0 is returned | 
|  |  | 
|  | int pm_runtime_resume(struct device *dev); | 
|  | - execute the subsystem-level resume callback for the device; returns 0 on | 
|  | success, 1 if the device's run-time PM status was already 'active' or | 
|  | error code on failure, where -EAGAIN means it may be safe to attempt to | 
|  | resume the device again in future, but 'power.runtime_error' should be | 
|  | checked additionally | 
|  |  | 
|  | int pm_request_idle(struct device *dev); | 
|  | - submit a request to execute the subsystem-level idle callback for the | 
|  | device (the request is represented by a work item in pm_wq); returns 0 on | 
|  | success or error code if the request has not been queued up | 
|  |  | 
|  | int pm_request_autosuspend(struct device *dev); | 
|  | - schedule the execution of the subsystem-level suspend callback for the | 
|  | device when the autosuspend delay has expired; if the delay has already | 
|  | expired then the work item is queued up immediately | 
|  |  | 
|  | int pm_schedule_suspend(struct device *dev, unsigned int delay); | 
|  | - schedule the execution of the subsystem-level suspend callback for the | 
|  | device in future, where 'delay' is the time to wait before queuing up a | 
|  | suspend work item in pm_wq, in milliseconds (if 'delay' is zero, the work | 
|  | item is queued up immediately); returns 0 on success, 1 if the device's PM | 
|  | run-time status was already 'suspended', or error code if the request | 
|  | hasn't been scheduled (or queued up if 'delay' is 0); if the execution of | 
|  | ->runtime_suspend() is already scheduled and not yet expired, the new | 
|  | value of 'delay' will be used as the time to wait | 
|  |  | 
|  | int pm_request_resume(struct device *dev); | 
|  | - submit a request to execute the subsystem-level resume callback for the | 
|  | device (the request is represented by a work item in pm_wq); returns 0 on | 
|  | success, 1 if the device's run-time PM status was already 'active', or | 
|  | error code if the request hasn't been queued up | 
|  |  | 
|  | void pm_runtime_get_noresume(struct device *dev); | 
|  | - increment the device's usage counter | 
|  |  | 
|  | int pm_runtime_get(struct device *dev); | 
|  | - increment the device's usage counter, run pm_request_resume(dev) and | 
|  | return its result | 
|  |  | 
|  | int pm_runtime_get_sync(struct device *dev); | 
|  | - increment the device's usage counter, run pm_runtime_resume(dev) and | 
|  | return its result | 
|  |  | 
|  | void pm_runtime_put_noidle(struct device *dev); | 
|  | - decrement the device's usage counter | 
|  |  | 
|  | int pm_runtime_put(struct device *dev); | 
|  | - decrement the device's usage counter; if the result is 0 then run | 
|  | pm_request_idle(dev) and return its result | 
|  |  | 
|  | int pm_runtime_put_autosuspend(struct device *dev); | 
|  | - decrement the device's usage counter; if the result is 0 then run | 
|  | pm_request_autosuspend(dev) and return its result | 
|  |  | 
|  | int pm_runtime_put_sync(struct device *dev); | 
|  | - decrement the device's usage counter; if the result is 0 then run | 
|  | pm_runtime_idle(dev) and return its result | 
|  |  | 
|  | int pm_runtime_put_sync_suspend(struct device *dev); | 
|  | - decrement the device's usage counter; if the result is 0 then run | 
|  | pm_runtime_suspend(dev) and return its result | 
|  |  | 
|  | int pm_runtime_put_sync_autosuspend(struct device *dev); | 
|  | - decrement the device's usage counter; if the result is 0 then run | 
|  | pm_runtime_autosuspend(dev) and return its result | 
|  |  | 
|  | void pm_runtime_enable(struct device *dev); | 
|  | - enable the run-time PM helper functions to run the device bus type's | 
|  | run-time PM callbacks described in Section 2 | 
|  |  | 
|  | int pm_runtime_disable(struct device *dev); | 
|  | - prevent the run-time PM helper functions from running subsystem-level | 
|  | run-time PM callbacks for the device, make sure that all of the pending | 
|  | run-time PM operations on the device are either completed or canceled; | 
|  | returns 1 if there was a resume request pending and it was necessary to | 
|  | execute the subsystem-level resume callback for the device to satisfy that | 
|  | request, otherwise 0 is returned | 
|  |  | 
|  | void pm_suspend_ignore_children(struct device *dev, bool enable); | 
|  | - set/unset the power.ignore_children flag of the device | 
|  |  | 
|  | int pm_runtime_set_active(struct device *dev); | 
|  | - clear the device's 'power.runtime_error' flag, set the device's run-time | 
|  | PM status to 'active' and update its parent's counter of 'active' | 
|  | children as appropriate (it is only valid to use this function if | 
|  | 'power.runtime_error' is set or 'power.disable_depth' is greater than | 
|  | zero); it will fail and return error code if the device has a parent | 
|  | which is not active and the 'power.ignore_children' flag of which is unset | 
|  |  | 
|  | void pm_runtime_set_suspended(struct device *dev); | 
|  | - clear the device's 'power.runtime_error' flag, set the device's run-time | 
|  | PM status to 'suspended' and update its parent's counter of 'active' | 
|  | children as appropriate (it is only valid to use this function if | 
|  | 'power.runtime_error' is set or 'power.disable_depth' is greater than | 
|  | zero) | 
|  |  | 
|  | bool pm_runtime_suspended(struct device *dev); | 
|  | - return true if the device's runtime PM status is 'suspended' and its | 
|  | 'power.disable_depth' field is equal to zero, or false otherwise | 
|  |  | 
|  | void pm_runtime_allow(struct device *dev); | 
|  | - set the power.runtime_auto flag for the device and decrease its usage | 
|  | counter (used by the /sys/devices/.../power/control interface to | 
|  | effectively allow the device to be power managed at run time) | 
|  |  | 
|  | void pm_runtime_forbid(struct device *dev); | 
|  | - unset the power.runtime_auto flag for the device and increase its usage | 
|  | counter (used by the /sys/devices/.../power/control interface to | 
|  | effectively prevent the device from being power managed at run time) | 
|  |  | 
|  | void pm_runtime_no_callbacks(struct device *dev); | 
|  | - set the power.no_callbacks flag for the device and remove the run-time | 
|  | PM attributes from /sys/devices/.../power (or prevent them from being | 
|  | added when the device is registered) | 
|  |  | 
|  | void pm_runtime_irq_safe(struct device *dev); | 
|  | - set the power.irq_safe flag for the device, causing the runtime-PM | 
|  | suspend and resume callbacks (but not the idle callback) to be invoked | 
|  | with interrupts disabled | 
|  |  | 
|  | void pm_runtime_mark_last_busy(struct device *dev); | 
|  | - set the power.last_busy field to the current time | 
|  |  | 
|  | void pm_runtime_use_autosuspend(struct device *dev); | 
|  | - set the power.use_autosuspend flag, enabling autosuspend delays | 
|  |  | 
|  | void pm_runtime_dont_use_autosuspend(struct device *dev); | 
|  | - clear the power.use_autosuspend flag, disabling autosuspend delays | 
|  |  | 
|  | void pm_runtime_set_autosuspend_delay(struct device *dev, int delay); | 
|  | - set the power.autosuspend_delay value to 'delay' (expressed in | 
|  | milliseconds); if 'delay' is negative then run-time suspends are | 
|  | prevented | 
|  |  | 
|  | unsigned long pm_runtime_autosuspend_expiration(struct device *dev); | 
|  | - calculate the time when the current autosuspend delay period will expire, | 
|  | based on power.last_busy and power.autosuspend_delay; if the delay time | 
|  | is 1000 ms or larger then the expiration time is rounded up to the | 
|  | nearest second; returns 0 if the delay period has already expired or | 
|  | power.use_autosuspend isn't set, otherwise returns the expiration time | 
|  | in jiffies | 
|  |  | 
|  | It is safe to execute the following helper functions from interrupt context: | 
|  |  | 
|  | pm_request_idle() | 
|  | pm_request_autosuspend() | 
|  | pm_schedule_suspend() | 
|  | pm_request_resume() | 
|  | pm_runtime_get_noresume() | 
|  | pm_runtime_get() | 
|  | pm_runtime_put_noidle() | 
|  | pm_runtime_put() | 
|  | pm_runtime_put_autosuspend() | 
|  | pm_runtime_enable() | 
|  | pm_suspend_ignore_children() | 
|  | pm_runtime_set_active() | 
|  | pm_runtime_set_suspended() | 
|  | pm_runtime_suspended() | 
|  | pm_runtime_mark_last_busy() | 
|  | pm_runtime_autosuspend_expiration() | 
|  |  | 
|  | If pm_runtime_irq_safe() has been called for a device then the following helper | 
|  | functions may also be used in interrupt context: | 
|  |  | 
|  | pm_runtime_suspend() | 
|  | pm_runtime_autosuspend() | 
|  | pm_runtime_resume() | 
|  | pm_runtime_get_sync() | 
|  | pm_runtime_put_sync_suspend() | 
|  |  | 
|  | 5. Run-time PM Initialization, Device Probing and Removal | 
|  |  | 
|  | Initially, the run-time PM is disabled for all devices, which means that the | 
|  | majority of the run-time PM helper funtions described in Section 4 will return | 
|  | -EAGAIN until pm_runtime_enable() is called for the device. | 
|  |  | 
|  | In addition to that, the initial run-time PM status of all devices is | 
|  | 'suspended', but it need not reflect the actual physical state of the device. | 
|  | Thus, if the device is initially active (i.e. it is able to process I/O), its | 
|  | run-time PM status must be changed to 'active', with the help of | 
|  | pm_runtime_set_active(), before pm_runtime_enable() is called for the device. | 
|  |  | 
|  | However, if the device has a parent and the parent's run-time PM is enabled, | 
|  | calling pm_runtime_set_active() for the device will affect the parent, unless | 
|  | the parent's 'power.ignore_children' flag is set.  Namely, in that case the | 
|  | parent won't be able to suspend at run time, using the PM core's helper | 
|  | functions, as long as the child's status is 'active', even if the child's | 
|  | run-time PM is still disabled (i.e. pm_runtime_enable() hasn't been called for | 
|  | the child yet or pm_runtime_disable() has been called for it).  For this reason, | 
|  | once pm_runtime_set_active() has been called for the device, pm_runtime_enable() | 
|  | should be called for it too as soon as reasonably possible or its run-time PM | 
|  | status should be changed back to 'suspended' with the help of | 
|  | pm_runtime_set_suspended(). | 
|  |  | 
|  | If the default initial run-time PM status of the device (i.e. 'suspended') | 
|  | reflects the actual state of the device, its bus type's or its driver's | 
|  | ->probe() callback will likely need to wake it up using one of the PM core's | 
|  | helper functions described in Section 4.  In that case, pm_runtime_resume() | 
|  | should be used.  Of course, for this purpose the device's run-time PM has to be | 
|  | enabled earlier by calling pm_runtime_enable(). | 
|  |  | 
|  | If the device bus type's or driver's ->probe() or ->remove() callback runs | 
|  | pm_runtime_suspend() or pm_runtime_idle() or their asynchronous counterparts, | 
|  | they will fail returning -EAGAIN, because the device's usage counter is | 
|  | incremented by the core before executing ->probe() and ->remove().  Still, it | 
|  | may be desirable to suspend the device as soon as ->probe() or ->remove() has | 
|  | finished, so the PM core uses pm_runtime_idle_sync() to invoke the | 
|  | subsystem-level idle callback for the device at that time. | 
|  |  | 
|  | The user space can effectively disallow the driver of the device to power manage | 
|  | it at run time by changing the value of its /sys/devices/.../power/control | 
|  | attribute to "on", which causes pm_runtime_forbid() to be called.  In principle, | 
|  | this mechanism may also be used by the driver to effectively turn off the | 
|  | run-time power management of the device until the user space turns it on. | 
|  | Namely, during the initialization the driver can make sure that the run-time PM | 
|  | status of the device is 'active' and call pm_runtime_forbid().  It should be | 
|  | noted, however, that if the user space has already intentionally changed the | 
|  | value of /sys/devices/.../power/control to "auto" to allow the driver to power | 
|  | manage the device at run time, the driver may confuse it by using | 
|  | pm_runtime_forbid() this way. | 
|  |  | 
|  | 6. Run-time PM and System Sleep | 
|  |  | 
|  | Run-time PM and system sleep (i.e., system suspend and hibernation, also known | 
|  | as suspend-to-RAM and suspend-to-disk) interact with each other in a couple of | 
|  | ways.  If a device is active when a system sleep starts, everything is | 
|  | straightforward.  But what should happen if the device is already suspended? | 
|  |  | 
|  | The device may have different wake-up settings for run-time PM and system sleep. | 
|  | For example, remote wake-up may be enabled for run-time suspend but disallowed | 
|  | for system sleep (device_may_wakeup(dev) returns 'false').  When this happens, | 
|  | the subsystem-level system suspend callback is responsible for changing the | 
|  | device's wake-up setting (it may leave that to the device driver's system | 
|  | suspend routine).  It may be necessary to resume the device and suspend it again | 
|  | in order to do so.  The same is true if the driver uses different power levels | 
|  | or other settings for run-time suspend and system sleep. | 
|  |  | 
|  | During system resume, devices generally should be brought back to full power, | 
|  | even if they were suspended before the system sleep began.  There are several | 
|  | reasons for this, including: | 
|  |  | 
|  | * The device might need to switch power levels, wake-up settings, etc. | 
|  |  | 
|  | * Remote wake-up events might have been lost by the firmware. | 
|  |  | 
|  | * The device's children may need the device to be at full power in order | 
|  | to resume themselves. | 
|  |  | 
|  | * The driver's idea of the device state may not agree with the device's | 
|  | physical state.  This can happen during resume from hibernation. | 
|  |  | 
|  | * The device might need to be reset. | 
|  |  | 
|  | * Even though the device was suspended, if its usage counter was > 0 then most | 
|  | likely it would need a run-time resume in the near future anyway. | 
|  |  | 
|  | * Always going back to full power is simplest. | 
|  |  | 
|  | If the device was suspended before the sleep began, then its run-time PM status | 
|  | will have to be updated to reflect the actual post-system sleep status.  The way | 
|  | to do this is: | 
|  |  | 
|  | pm_runtime_disable(dev); | 
|  | pm_runtime_set_active(dev); | 
|  | pm_runtime_enable(dev); | 
|  |  | 
|  | The PM core always increments the run-time usage counter before calling the | 
|  | ->prepare() callback and decrements it after calling the ->complete() callback. | 
|  | Hence disabling run-time PM temporarily like this will not cause any run-time | 
|  | suspend callbacks to be lost. | 
|  |  | 
|  | 7. Generic subsystem callbacks | 
|  |  | 
|  | Subsystems may wish to conserve code space by using the set of generic power | 
|  | management callbacks provided by the PM core, defined in | 
|  | driver/base/power/generic_ops.c: | 
|  |  | 
|  | int pm_generic_runtime_idle(struct device *dev); | 
|  | - invoke the ->runtime_idle() callback provided by the driver of this | 
|  | device, if defined, and call pm_runtime_suspend() for this device if the | 
|  | return value is 0 or the callback is not defined | 
|  |  | 
|  | int pm_generic_runtime_suspend(struct device *dev); | 
|  | - invoke the ->runtime_suspend() callback provided by the driver of this | 
|  | device and return its result, or return -EINVAL if not defined | 
|  |  | 
|  | int pm_generic_runtime_resume(struct device *dev); | 
|  | - invoke the ->runtime_resume() callback provided by the driver of this | 
|  | device and return its result, or return -EINVAL if not defined | 
|  |  | 
|  | int pm_generic_suspend(struct device *dev); | 
|  | - if the device has not been suspended at run time, invoke the ->suspend() | 
|  | callback provided by its driver and return its result, or return 0 if not | 
|  | defined | 
|  |  | 
|  | int pm_generic_resume(struct device *dev); | 
|  | - invoke the ->resume() callback provided by the driver of this device and, | 
|  | if successful, change the device's runtime PM status to 'active' | 
|  |  | 
|  | int pm_generic_freeze(struct device *dev); | 
|  | - if the device has not been suspended at run time, invoke the ->freeze() | 
|  | callback provided by its driver and return its result, or return 0 if not | 
|  | defined | 
|  |  | 
|  | int pm_generic_thaw(struct device *dev); | 
|  | - if the device has not been suspended at run time, invoke the ->thaw() | 
|  | callback provided by its driver and return its result, or return 0 if not | 
|  | defined | 
|  |  | 
|  | int pm_generic_poweroff(struct device *dev); | 
|  | - if the device has not been suspended at run time, invoke the ->poweroff() | 
|  | callback provided by its driver and return its result, or return 0 if not | 
|  | defined | 
|  |  | 
|  | int pm_generic_restore(struct device *dev); | 
|  | - invoke the ->restore() callback provided by the driver of this device and, | 
|  | if successful, change the device's runtime PM status to 'active' | 
|  |  | 
|  | These functions can be assigned to the ->runtime_idle(), ->runtime_suspend(), | 
|  | ->runtime_resume(), ->suspend(), ->resume(), ->freeze(), ->thaw(), ->poweroff(), | 
|  | or ->restore() callback pointers in the subsystem-level dev_pm_ops structures. | 
|  |  | 
|  | If a subsystem wishes to use all of them at the same time, it can simply assign | 
|  | the GENERIC_SUBSYS_PM_OPS macro, defined in include/linux/pm.h, to its | 
|  | dev_pm_ops structure pointer. | 
|  |  | 
|  | Device drivers that wish to use the same function as a system suspend, freeze, | 
|  | poweroff and run-time suspend callback, and similarly for system resume, thaw, | 
|  | restore, and run-time resume, can achieve this with the help of the | 
|  | UNIVERSAL_DEV_PM_OPS macro defined in include/linux/pm.h (possibly setting its | 
|  | last argument to NULL). | 
|  |  | 
|  | 8. "No-Callback" Devices | 
|  |  | 
|  | Some "devices" are only logical sub-devices of their parent and cannot be | 
|  | power-managed on their own.  (The prototype example is a USB interface.  Entire | 
|  | USB devices can go into low-power mode or send wake-up requests, but neither is | 
|  | possible for individual interfaces.)  The drivers for these devices have no | 
|  | need of run-time PM callbacks; if the callbacks did exist, ->runtime_suspend() | 
|  | and ->runtime_resume() would always return 0 without doing anything else and | 
|  | ->runtime_idle() would always call pm_runtime_suspend(). | 
|  |  | 
|  | Subsystems can tell the PM core about these devices by calling | 
|  | pm_runtime_no_callbacks().  This should be done after the device structure is | 
|  | initialized and before it is registered (although after device registration is | 
|  | also okay).  The routine will set the device's power.no_callbacks flag and | 
|  | prevent the non-debugging run-time PM sysfs attributes from being created. | 
|  |  | 
|  | When power.no_callbacks is set, the PM core will not invoke the | 
|  | ->runtime_idle(), ->runtime_suspend(), or ->runtime_resume() callbacks. | 
|  | Instead it will assume that suspends and resumes always succeed and that idle | 
|  | devices should be suspended. | 
|  |  | 
|  | As a consequence, the PM core will never directly inform the device's subsystem | 
|  | or driver about run-time power changes.  Instead, the driver for the device's | 
|  | parent must take responsibility for telling the device's driver when the | 
|  | parent's power state changes. | 
|  |  | 
|  | 9. Autosuspend, or automatically-delayed suspends | 
|  |  | 
|  | Changing a device's power state isn't free; it requires both time and energy. | 
|  | A device should be put in a low-power state only when there's some reason to | 
|  | think it will remain in that state for a substantial time.  A common heuristic | 
|  | says that a device which hasn't been used for a while is liable to remain | 
|  | unused; following this advice, drivers should not allow devices to be suspended | 
|  | at run-time until they have been inactive for some minimum period.  Even when | 
|  | the heuristic ends up being non-optimal, it will still prevent devices from | 
|  | "bouncing" too rapidly between low-power and full-power states. | 
|  |  | 
|  | The term "autosuspend" is an historical remnant.  It doesn't mean that the | 
|  | device is automatically suspended (the subsystem or driver still has to call | 
|  | the appropriate PM routines); rather it means that run-time suspends will | 
|  | automatically be delayed until the desired period of inactivity has elapsed. | 
|  |  | 
|  | Inactivity is determined based on the power.last_busy field.  Drivers should | 
|  | call pm_runtime_mark_last_busy() to update this field after carrying out I/O, | 
|  | typically just before calling pm_runtime_put_autosuspend().  The desired length | 
|  | of the inactivity period is a matter of policy.  Subsystems can set this length | 
|  | initially by calling pm_runtime_set_autosuspend_delay(), but after device | 
|  | registration the length should be controlled by user space, using the | 
|  | /sys/devices/.../power/autosuspend_delay_ms attribute. | 
|  |  | 
|  | In order to use autosuspend, subsystems or drivers must call | 
|  | pm_runtime_use_autosuspend() (preferably before registering the device), and | 
|  | thereafter they should use the various *_autosuspend() helper functions instead | 
|  | of the non-autosuspend counterparts: | 
|  |  | 
|  | Instead of: pm_runtime_suspend    use: pm_runtime_autosuspend; | 
|  | Instead of: pm_schedule_suspend   use: pm_request_autosuspend; | 
|  | Instead of: pm_runtime_put        use: pm_runtime_put_autosuspend; | 
|  | Instead of: pm_runtime_put_sync   use: pm_runtime_put_sync_autosuspend. | 
|  |  | 
|  | Drivers may also continue to use the non-autosuspend helper functions; they | 
|  | will behave normally, not taking the autosuspend delay into account. | 
|  | Similarly, if the power.use_autosuspend field isn't set then the autosuspend | 
|  | helper functions will behave just like the non-autosuspend counterparts. | 
|  |  | 
|  | The implementation is well suited for asynchronous use in interrupt contexts. | 
|  | However such use inevitably involves races, because the PM core can't | 
|  | synchronize ->runtime_suspend() callbacks with the arrival of I/O requests. | 
|  | This synchronization must be handled by the driver, using its private lock. | 
|  | Here is a schematic pseudo-code example: | 
|  |  | 
|  | foo_read_or_write(struct foo_priv *foo, void *data) | 
|  | { | 
|  | lock(&foo->private_lock); | 
|  | add_request_to_io_queue(foo, data); | 
|  | if (foo->num_pending_requests++ == 0) | 
|  | pm_runtime_get(&foo->dev); | 
|  | if (!foo->is_suspended) | 
|  | foo_process_next_request(foo); | 
|  | unlock(&foo->private_lock); | 
|  | } | 
|  |  | 
|  | foo_io_completion(struct foo_priv *foo, void *req) | 
|  | { | 
|  | lock(&foo->private_lock); | 
|  | if (--foo->num_pending_requests == 0) { | 
|  | pm_runtime_mark_last_busy(&foo->dev); | 
|  | pm_runtime_put_autosuspend(&foo->dev); | 
|  | } else { | 
|  | foo_process_next_request(foo); | 
|  | } | 
|  | unlock(&foo->private_lock); | 
|  | /* Send req result back to the user ... */ | 
|  | } | 
|  |  | 
|  | int foo_runtime_suspend(struct device *dev) | 
|  | { | 
|  | struct foo_priv foo = container_of(dev, ...); | 
|  | int ret = 0; | 
|  |  | 
|  | lock(&foo->private_lock); | 
|  | if (foo->num_pending_requests > 0) { | 
|  | ret = -EBUSY; | 
|  | } else { | 
|  | /* ... suspend the device ... */ | 
|  | foo->is_suspended = 1; | 
|  | } | 
|  | unlock(&foo->private_lock); | 
|  | return ret; | 
|  | } | 
|  |  | 
|  | int foo_runtime_resume(struct device *dev) | 
|  | { | 
|  | struct foo_priv foo = container_of(dev, ...); | 
|  |  | 
|  | lock(&foo->private_lock); | 
|  | /* ... resume the device ... */ | 
|  | foo->is_suspended = 0; | 
|  | pm_runtime_mark_last_busy(&foo->dev); | 
|  | if (foo->num_pending_requests > 0) | 
|  | foo_process_requests(foo); | 
|  | unlock(&foo->private_lock); | 
|  | return 0; | 
|  | } | 
|  |  | 
|  | The important point is that after foo_io_completion() asks for an autosuspend, | 
|  | the foo_runtime_suspend() callback may race with foo_read_or_write(). | 
|  | Therefore foo_runtime_suspend() has to check whether there are any pending I/O | 
|  | requests (while holding the private lock) before allowing the suspend to | 
|  | proceed. | 
|  |  | 
|  | In addition, the power.autosuspend_delay field can be changed by user space at | 
|  | any time.  If a driver cares about this, it can call | 
|  | pm_runtime_autosuspend_expiration() from within the ->runtime_suspend() | 
|  | callback while holding its private lock.  If the function returns a nonzero | 
|  | value then the delay has not yet expired and the callback should return | 
|  | -EAGAIN. |