Web lists-archives.com

Re: [PATCH 1/2] PM / genpd: Stop/start devices without pm_runtime_force_suspend/resume()




On 12 January 2018 at 14:10, Rafael J. Wysocki <rjw@xxxxxxxxxxxxx> wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
>
> There are problems with calling pm_runtime_force_suspend/resume()
> to "stop" and "start" devices in genpd_finish_suspend() and
> genpd_resume_noirq() (and in analogous hibernation-specific genpd
> callbacks) after commit 122a22377a3d (PM / Domains: Stop/start
> devices during system PM suspend/resume in genpd) as those routines
> do much more than just "stopping" and "starting" devices (which was
> the stated purpose of that commit) unnecessarily and may not play
> well with system-wide PM driver callbacks.
>
> First, consider the pm_runtime_force_suspend() in
> genpd_finish_suspend().  If the current runtime PM status of the
> device is "suspended", that function most likely does the right thing
> by ignoring the device, because it should have been "stopped" already
> and whatever needed to be done to deactivate it shoud have been done.
> In turn, if the runtime PM status of the device is "active",
> genpd_runtime_suspend() is called for it (indirectly) and (1) runs
> the ->runtime_suspend callback provided by the device's driver
> (assuming no bus type with ->runtime_suspend of its own), (2) "stops"
> the device and (3) checks if the domain can be powered down, and then
> (4) the device's runtime PM status is changed to "suspended".  Out of
> the four actions above (1) is not necessary and it may be outright
> harmful, (3) is pointless and (4) is questionable.  The only
> operation that needs to be carried out here is (2).
>
> The reason why (1) is not necessary is because the system-wide
> PM callbacks provided by the device driver for the transition in
> question have been run and they should have taken care of the
> driver's part of device suspend already.  Moreover, it may be
> harmful, because the ->runtime_suspend callback may want to
> access the device which is partially suspended at that point
> and may not be responsive.  Also, system-wide PM callbacks may
> have been run already (in the previous phases of the system
> transition under way) for the device's parent or for its supplier
> devices (if any) and the device may not be accessible because of
> that.
>
> There also is no reason to do (3), because genpd_finish_suspend()
> will repeat it anyway, and (4) potentially causes confusion to ensue
> during the subsequent system transition to the working state.
>
> Consider pm_runtime_force_resume() in genpd_resume_noirq() now.
> It runs genpd_runtime_resume() for all devices with runtime PM
> status set to "suspended", which includes all of the devices
> whose runtime PM status was changed by pm_runtime_force_suspend()
> before and may include some devices already suspended when the
> pm_runtime_force_suspend() was running, which may be confusing.  The
> genpd_runtime_resume() first tries to power up the domain, which
> (again) is pointless, because genpd_resume_noirq() has done that
> already.  Then, it "starts" the device and runs the ->runtime_resume
> callback (from the driver, say) for it.  If all is well, the device
> is left with the runtime PM status set to "active".
>
> Unfortunately, running the driver's ->runtime_resume callback
> before its system-wide PM callbacks and possibly before some
> system-wide PM callbacks of the parent device's driver (let
> alone supplier drivers) is asking for trouble, especially if
> the device had been suspended before pm_runtime_force_suspend()
> ran previously or if the callbacks in question expect to be run
> back-to-back with their suspend-side counterparts.  It also should
> not be necessary, because the system-wide PM driver callbacks that
> will be invoked for the device subsequently should take care of
> resuming it just fine.
>
> [Running the driver's ->runtime_resume callback in the "noirq"
> phase of the transition to the working state may be problematic
> even for devices whose drivers do use pm_runtime_force_resume()
> in (or as) their system-wide PM callbacks if they have suppliers
> other than their parents, because it may cause the supplier to
> be resumed after the consumer in some cases.]
>
> Because of the above, modify genpd as follows:
>
>  1. Change genpd_finish_suspend() to only "stop" devices with
>     runtime PM status set to "active" (without invoking runtime PM
>     callbacks for them, changing their runtime PM status and so on).
>
>     That doesn't change the handling of devices whose drivers use
>     pm_runtime_force_suspend/resume() in (or as) their system-wide
>     PM callbacks and addresses the issues described above for the
>     other devices.
>
>  2. Change genpd_resume_noirq() to only "start" devices with
>     runtime PM status set to "active" (without invoking runtime PM
>     callbacks for them, changing their runtime PM status and so on).
>
>     Again, that doesn't change the handling of devices whose drivers
>     use pm_runtime_force_suspend/resume() in (or as) their system-wide
>     PM callbacks and addresses the described issues for the other
>     devices.  Devices with runtime PM status set to "suspended"
>     are not started with the assumption that they will be resumed
>     later, either by pm_runtime_force_resume() or via runtime PM.
>
>  3. Change genpd_restore_noirq() to follow genpd_resume_noirq().
>
>     That causes devices already suspended before hibernation to be
>     left alone (which also is the case without the change) and
>     avoids running the ->runtime_resume driver callback too early
>     for the other devices.
>
>  4. Change genpd_freeze_noirq() and genpd_thaw_noirq() in accordance
>     with the above modifications.
>
> Fixes: 122a22377a3d (PM / Domains: Stop/start devices during system PM suspend/resume in genpd)
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>

That was an extensive changlog, thanks for the details and for working on this!

Acked-by: Ulf Hansson <ulf.hansson@xxxxxxxxxx>

Kind regards
Uffe

> ---
>  drivers/base/power/domain.c |   25 +++++++++++++++----------
>  1 file changed, 15 insertions(+), 10 deletions(-)
>
> Index: linux-pm/drivers/base/power/domain.c
> ===================================================================
> --- linux-pm.orig/drivers/base/power/domain.c
> +++ linux-pm/drivers/base/power/domain.c
> @@ -1048,8 +1048,9 @@ static int genpd_finish_suspend(struct d
>         if (dev->power.wakeup_path && genpd_is_active_wakeup(genpd))
>                 return 0;
>
> -       if (genpd->dev_ops.stop && genpd->dev_ops.start) {
> -               ret = pm_runtime_force_suspend(dev);
> +       if (genpd->dev_ops.stop && genpd->dev_ops.start &&
> +           !pm_runtime_status_suspended(dev)) {
> +               ret = genpd_stop_dev(genpd, dev);
>                 if (ret) {
>                         if (poweroff)
>                                 pm_generic_restore_noirq(dev);
> @@ -1106,8 +1107,9 @@ static int genpd_resume_noirq(struct dev
>         genpd->suspended_count--;
>         genpd_unlock(genpd);
>
> -       if (genpd->dev_ops.stop && genpd->dev_ops.start) {
> -               ret = pm_runtime_force_resume(dev);
> +       if (genpd->dev_ops.stop && genpd->dev_ops.start &&
> +           !pm_runtime_status_suspended(dev)) {
> +               ret = genpd_start_dev(genpd, dev);
>                 if (ret)
>                         return ret;
>         }
> @@ -1139,8 +1141,9 @@ static int genpd_freeze_noirq(struct dev
>         if (ret)
>                 return ret;
>
> -       if (genpd->dev_ops.stop && genpd->dev_ops.start)
> -               ret = pm_runtime_force_suspend(dev);
> +       if (genpd->dev_ops.stop && genpd->dev_ops.start &&
> +           !pm_runtime_status_suspended(dev))
> +               ret = genpd_stop_dev(genpd, dev);
>
>         return ret;
>  }
> @@ -1163,8 +1166,9 @@ static int genpd_thaw_noirq(struct devic
>         if (IS_ERR(genpd))
>                 return -EINVAL;
>
> -       if (genpd->dev_ops.stop && genpd->dev_ops.start) {
> -               ret = pm_runtime_force_resume(dev);
> +       if (genpd->dev_ops.stop && genpd->dev_ops.start &&
> +           !pm_runtime_status_suspended(dev)) {
> +               ret = genpd_start_dev(genpd, dev);
>                 if (ret)
>                         return ret;
>         }
> @@ -1221,8 +1225,9 @@ static int genpd_restore_noirq(struct de
>         genpd_sync_power_on(genpd, true, 0);
>         genpd_unlock(genpd);
>
> -       if (genpd->dev_ops.stop && genpd->dev_ops.start) {
> -               ret = pm_runtime_force_resume(dev);
> +       if (genpd->dev_ops.stop && genpd->dev_ops.start &&
> +           !pm_runtime_status_suspended(dev)) {
> +               ret = genpd_start_dev(genpd, dev);
>                 if (ret)
>                         return ret;
>         }
>