Web lists-archives.com

Re: [PATCH] drm/rockchip: Allow driver to be shutdown on reboot/kexec




Hi Brian,

Am Mittwoch, 5. Dezember 2018, 04:01:34 CET schrieb Brian Norris:
> + others
> 
> Hi,
> 
> On Sun, Aug 05, 2018 at 01:48:07PM +0100, Marc Zyngier wrote:
> > Leaving the DRM driver enabled on reboot or kexec has the annoying
> > effect of leaving the display generating transactions whilst the
> > IOMMU has been shut down.
> > 
> > In turn, the IOMMU driver (which shares its interrupt line with
> > the VOP) starts warning either on shutdown or when entering the
> > secondary kernel in the kexec case (nothing is expected on that
> > front).
> > 
> > A cheap way of ensuring that things are nicely shut down is to
> > register a shutdown callback in the platform driver.
> > 
> > Signed-off-by: Marc Zyngier <marc.zyngier@xxxxxxx>
> > ---
> 
> This patch made it into 4.20-rc1 as well as -stable, and it has caused
> regressions for me, on the Kevin and Scarlet [1] RK3399 platforms.
>
> On
> shutdown/reboot, I see this:
> 
> [   94.742559] WARNING: CPU: 4 PID: 2035 at
> drivers/gpu/drm/drm_mode_config.c:477 drm_mode_config_cleanup+0x1c4/0x294
> ...
> [   94.775904] CPU: 4 PID: 2035 Comm: reboot Tainted: G        W        
> 4.20.0-rc5+ #83 [   94.784651] Hardware name: Google Scarlet (DT)
> [   94.789611] pstate: 20000005 (nzCv daif -PAN -UAO)
> [   94.794959] pc : drm_mode_config_cleanup+0x1c4/0x294
> [   94.800500] lr : drm_mode_config_cleanup+0x108/0x294
> ...
> [   94.898683] Call trace:
> [   94.901410]  drm_mode_config_cleanup+0x1c4/0x294
> [   94.906565]  rockchip_drm_unbind+0x4c/0x8c
> [   94.911138]  component_master_del+0x88/0xb8
> [   94.915807]  rockchip_drm_platform_remove+0x2c/0x44
> [   94.921243]  rockchip_drm_platform_shutdown+0x20/0x2c
> [   94.926881]  platform_drv_shutdown+0x2c/0x38
> [   94.931647]  device_shutdown+0x164/0x1b8
> [   94.936016]  kernel_restart_prepare+0x40/0x48
> [   94.940878]  kernel_restart+0x20/0x68
> [   94.944964]  __se_sys_reboot+0x1ac/0x204
> [   94.949331]  __arm64_sys_reboot+0x2c/0x38
> [   94.953806]  el0_svc_common+0xa4/0xec
> [   94.957891]  el0_svc_compat_handler+0x30/0x3c
> [   94.962753]  el0_svc_compat+0x8/0x18
> [   94.966740] ---[ end trace b9ba2e701f4fb233 ]---
> [   95.255169] Memory manager not clean during takedown.
> [   95.260824] WARNING: CPU: 4 PID: 2035 at drivers/gpu/drm/drm_mm.c:950
> drm_mm_takedown+0x34/0x44 ...
> [   95.292314] CPU: 4 PID: 2035 Comm: reboot Tainted: G        W        
> 4.20.0-rc5+ #83 [   95.301061] Hardware name: Google Scarlet (DT)
> [   95.306020] pstate: 60000005 (nZCv daif -PAN -UAO)
> [   95.311369] pc : drm_mm_takedown+0x34/0x44
> [   95.315940] lr : drm_mm_takedown+0x34/0x44
> ...
> [   95.415857]  drm_mm_takedown+0x34/0x44
> [   95.420042]  rockchip_drm_unbind+0x64/0x8c
> [   95.424613]  component_master_del+0x88/0xb8
> [   95.429283]  rockchip_drm_platform_remove+0x2c/0x44
> [   95.434728]  rockchip_drm_platform_shutdown+0x20/0x2c
> [   95.440360]  platform_drv_shutdown+0x2c/0x38
> [   95.445127]  device_shutdown+0x164/0x1b8
> [   95.449504]  kernel_restart_prepare+0x40/0x48
> [   95.454358]  kernel_restart+0x20/0x68
> [   95.458436]  __se_sys_reboot+0x1ac/0x204
> [   95.462812]  __arm64_sys_reboot+0x2c/0x38
> [   95.467287]  el0_svc_common+0xa4/0xec
> [   95.471373]  el0_svc_compat_handler+0x30/0x3c
> [   95.476235]  el0_svc_compat+0x8/0x18
> [   95.480215] ---[ end trace b9ba2e701f4fb234 ]---
> 
> It's especially bad on -stable kernels, where I believe the remove()
> paths were even worse. This triggers a variety of OOPSes, and it's not
> clear if those are simply because of backports (e.g., RK3399 did not
> have support in 4.4.y, but our downstream has merged all sorts of
> backports to make it work).
> 
> Anyway, the above warnings occur on v4.20-rc, which I think is
> justification enough for a revert.

That's strange. I remember testing quite a number of shutdown/reboot
cycles before applying that patch. And for good measure did the same
again right now.

- Kevin, with netboot firmware, booting into Debian+console only
- Bob, with stock firmware, booting into Debian+KDE Plasma
- Scarlet, with stock firmware, booting into Debian+KDE Plasma

With some random number of reboot and shutdowns on each I didn't
see any warnings at all.

> I plan to submit a revert which I hope can go to 4.20 as well as
> -stable. I'd hope the remove()/shutdown() paths should be fixed before
> this gets applied again, and that it does not get shipped to -stable
> kernels.

But judging by the fact that the warning indicates that somthing is still
holding onto a framebuffer and a rmmod rockchipdrm is not possible
at runrtime for likely the same reason, I guess we really might be creating
a problem with that shutdown.

Can you maybe give "drm/rockchip: shutdown drm subsystem on shutdown" [2]
a try? When the underlying issue of rebooting surfaced we had 2 competing 
solutions, so we at least don't reopen the issue, that people have problems
rebooting?

drm-drivers seem to be short on shutdown handlers to peek at but this
second variant mimics what tinydrm does in its shutdown
(calling drm_atomic_helper_shutdown), so at least shouldn't be completely
wrong.


[2] https://patchwork.kernel.org/patch/10556151/


Heiko


> [1] Technically Scarlet needed a few patches from -next to work at all,
>     but Kevin is a similar platform that has been working for several
>     releases.
> 
> >  drivers/gpu/drm/rockchip/rockchip_drm_drv.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
> > b/drivers/gpu/drm/rockchip/rockchip_drm_drv.c index
> > f814d37b1db2..05368fa4f956 100644
> > --- a/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
> > +++ b/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
> > @@ -442,6 +442,11 @@ static int rockchip_drm_platform_remove(struct
> > platform_device *pdev)> 
> >  	return 0;
> >  
> >  }
> > 
> > +static void rockchip_drm_platform_shutdown(struct platform_device *pdev)
> > +{
> > +	rockchip_drm_platform_remove(pdev);
> > +}
> > +
> > 
> >  static const struct of_device_id rockchip_drm_dt_ids[] = {
> >  
> >  	{ .compatible = "rockchip,display-subsystem", },
> >  	{ /* sentinel */ },
> > 
> > @@ -451,6 +456,7 @@ MODULE_DEVICE_TABLE(of, rockchip_drm_dt_ids);
> > 
> >  static struct platform_driver rockchip_drm_platform_driver = {
> >  
> >  	.probe = rockchip_drm_platform_probe,
> >  	.remove = rockchip_drm_platform_remove,
> > 
> > +	.shutdown = rockchip_drm_platform_shutdown,
> > 
> >  	.driver = {
> >  	
> >  		.name = "rockchip-drm",
> >  		.of_match_table = rockchip_drm_dt_ids,