mmc: msm_sdcc: fix race conditions in runtime PM
What is the race condition? (NOTE: RPM stands for Runtime Power Management)
1. SDCC is in RPM_SUSPENEDED state and SDCC platform suspend gets
triggered and then system goes into suspend.
2. During platform resume, SDCC platform resume is triggered
which blindly sets the pending_resume flag in SDCC host structure.
3. If new MMC transfer request comes then MMC block driver calls
mmc_claim_host() which internally calls msmsdcc_enable().
4. msmsdcc_enable() checks for following 3 conditions to be true:
1. host->sdcc_suspended == true
2. host->pending_resume == true
3. pm_runtime_suspended() == false
But as SDCC state is RPM_SUSPENDED, 3rd condition is not be satisfied
so msmsdcc_enable() don't clear the host->pending_resume flag and simply
calls pm_runtime_get_sync() to resume the SDCC. Once SDCC is resumed,
host->sdcc_suspended = false, runtime_status = RPM_ACTIVE but
host->pending_resume is still set.
5. Now after RPM idle timeout, SDCC runtime suspend gets triggered which
calls SDCC driver's runtime suspend callback (msmsdcc_runtime_suspend).
Note that before executing the RPM suspend calllback, SDCC RPM status
is set to RPM_SUSPENDING.
6. As SDCC driver's runtime suspend callback gets executed, it sets the
host->sdcc_suspended to true. But now before the RPM framework
sets the SDCC RPM status to RPM_SUSPENDED, on other active CPU core,
new MMC transfer reques comes in which would first call msmsdcc_enable()
and all of the 3 conditions below is true:
1. host->sdcc_suspended == true
2. host->pending_resume == true
3. pm_runtime_suspended() == false (RPM status is RPM_SUSPENDING)
As these conditions are satisfied, msmsdcc_enable() does not call
pm_runtime_get_sync(), instead just calls pm_runtime_get_noresume() and
msmsdcc_runtime_resume(). This means even after execution of
msmsdcc_enable(), SDCC RPM status is either RPM_SUSPENDING or
RPM_SUSPENDED.
7. RPM suspend framework on 1st core sets the SDCC RPM status to
RPM_SUSPENDED once SDCC runtime suspend callback returns.
8. Now once MMC transfer request is completed on other core, it will call
msmsdcc_disable(). This function calls pm_runtime_put_sync() which
returns -EAGAIN error as RPM status is already RPM_SUSPENED.
This error gets returned to MMC layer so MMC layer thinks that
SDCC is still enabled and skips call to msmsdcc_enable() until
msmsdcc_disable() succeeds.
8. Note when msmsdcc_disable() returned error, RPM usage_counter was set to
0 so next call to msmsdcc_disable() decrements RPM usage_counter to -1
(remember msmsdcc_enable() was skipped).
9. Now new MMC request comes in and it will first call msmsdcc_enable()
which calls pm_runtime_get_sync() and it sets the usage_counter to 0
and blindly resumes the SDCC as it was already suspended. After this
RPM status will be RPM_ACTIVE.
10. Once MMC request is processed, it will call smsdcc_disable() which
calls pm_runtime_put_sync() and it decrements RPM usage counter to -1
and skips scheduling runtime suspend callback as RPM usage counter
is not 0. RPM status remains in RPM_ACTIVE.
11. Now onwards for every new MMC transfer requests, 9 and 10 repeats and
SDCC always stays in RPM_ACTIVE state forever.
How is this race condition fixed?
Above race is created because host->pending_resume is remaining sticky.
Following changes are done to ensure that pending_resume gets set and
cleared at appropriate places:
1. In SDCC platform resume callback, set the host->pending_resume flag
only if RPM status is RPM_ACTIVE.
2. Clear the pending_resume flag once msmsdcc_runtime_resume() is
completed.
3. In msmsdcc_enable() function, if pending_resume flag is set skip calling
pm_runtime_get_sync() instead directly call msmsdcc_runtime_resume()
because RPM status is RPM_ACTIVE.
In addition, this patch adds WARN() messages in case of failures in
msmsdcc_enable() and msmsdcc_disable() functions to capture more details
in error conditions.
CRs-Fixed: 373338
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
(cherry picked from commit 386ad800716b6660d40c677b57ee4bf8e4430329)
Change-Id: I2c3b61c1b153cfed6db8c90ae3f7334b8830b5f7
Signed-off-by: Sudhir Sharma <sudsha@codeaurora.org>
diff --git a/drivers/mmc/host/msm_sdcc.c b/drivers/mmc/host/msm_sdcc.c
index 3dbf46f..804a6ed 100644
--- a/drivers/mmc/host/msm_sdcc.c
+++ b/drivers/mmc/host/msm_sdcc.c
@@ -3291,6 +3291,9 @@
{
struct device *dev = mmc_dev(host->mmc);
+ pr_info("%s: PM: sdcc_suspended=%d, pending_resume=%d, sdcc_suspending=%d\n",
+ mmc_hostname(host->mmc), host->sdcc_suspended,
+ host->pending_resume, host->sdcc_suspending);
pr_info("%s: RPM: runtime_status=%d, usage_count=%d,"
" is_suspended=%d, disable_depth=%d, runtime_error=%d,"
" request_pending=%d, request=%d\n",
@@ -3312,8 +3315,7 @@
if (mmc->card && mmc_card_sdio(mmc->card))
goto out;
- if (host->sdcc_suspended && host->pending_resume &&
- !pm_runtime_suspended(dev)) {
+ if (host->sdcc_suspended && host->pending_resume) {
host->pending_resume = false;
pm_runtime_get_noresume(dev);
rc = msmsdcc_runtime_resume(dev);
@@ -3334,8 +3336,8 @@
skip_get_sync:
if (rc < 0) {
- pr_info("%s: %s: failed with error %d", mmc_hostname(mmc),
- __func__, rc);
+ WARN(1, "%s: %s: failed with error %d\n", mmc_hostname(mmc),
+ __func__, rc);
msmsdcc_print_rpm_info(host);
return rc;
}
@@ -3361,15 +3363,9 @@
rc = pm_runtime_put_sync(mmc->parent);
- /*
- * Ignore -EAGAIN as that is not fatal, it means that
- * either runtime usage count is non-zero or the runtime
- * pm itself is disabled or not in proper state to process
- * idle notification.
- */
- if (rc < 0 && (rc != -EAGAIN)) {
- pr_info("%s: %s: failed with error %d", mmc_hostname(mmc),
- __func__, rc);
+ if (rc < 0) {
+ WARN(1, "%s: %s: failed with error %d\n", mmc_hostname(mmc),
+ __func__, rc);
msmsdcc_print_rpm_info(host);
return rc;
}
@@ -6261,6 +6257,7 @@
wake_unlock(&host->sdio_suspend_wlock);
}
+ host->pending_resume = false;
pr_debug("%s: %s: end\n", mmc_hostname(mmc), __func__);
return 0;
}
@@ -6333,7 +6330,13 @@
if (mmc->card && mmc_card_sdio(mmc->card))
rc = msmsdcc_runtime_resume(dev);
- else
+ /*
+ * As runtime PM is enabled before calling the device's platform resume
+ * callback, we use the pm_runtime_suspended API to know if SDCC is
+ * really runtime suspended or not and set the pending_resume flag only
+ * if its not runtime suspended.
+ */
+ else if (!pm_runtime_suspended(dev))
host->pending_resume = true;
if (host->plat->status_irq) {