[RFC PATCH] usb: hub: Disable autosuspend before disabling usb device
- Date: Fri, 17 Mar 2017 09:45:55 -0700
- From: Guenter Roeck <linux@xxxxxxxxxxxx>
- Subject: [RFC PATCH] usb: hub: Disable autosuspend before disabling usb device
While running a bind/unbind stress test with the dwc3 usb driver on rk3399,
the following crash was observed.
Unable to handle kernel NULL pointer dereference at virtual address 00000218
pgd = ffffffc00165f000
 *pgd=000000000174f003, *pud=000000000174f003,
Internal error: Oops: 96000005 [#1] PREEMPT SMP
Modules linked in: uinput uvcvideo videobuf2_vmalloc cmac
ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat rfcomm
xt_mark fuse bridge stp llc zram btusb btrtl btbcm btintel bluetooth
ip6table_filter mwifiex_pcie mwifiex cfg80211 cdc_ether usbnet r8152 mii joydev
snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device ppp_async
ppp_generic slhc tun
CPU: 1 PID: 29814 Comm: kworker/1:1 Not tainted 4.4.52 #507
Hardware name: Google Kevin (DT)
Workqueue: pm pm_runtime_work
task: ffffffc0ac540000 ti: ffffffc0af4d4000 task.ti: ffffffc0af4d4000
PC is at autosuspend_check+0x74/0x174
LR is at autosuspend_check+0x70/0x174
(gdb) l *0xffffffc00080dcc0
0xffffffc00080dcc0 is in autosuspend_check
1773 /* We don't need to check interfaces that are
1774 * disabled for runtime PM. Either they are unbound
1775 * or else their drivers don't support autosuspend
1776 * and so they are permanently active.
1778 if (intf->dev.power.disable_depth)
1780 if (atomic_read(&intf->dev.power.usage_count) > 0)
1781 return -EBUSY;
1782 w |= intf->needs_remote_wakeup;
Code analysis shows that intf is set to NULL in usb_disable_device() prior
to setting actconfig to NULL. At the same time, usb_runtime_idle() does not
lock the usb device, and neither does any of the functions in the
traceback. This means that there is no protection against a race condition
where usb_disable_device() is removing dev->actconfig->interface pointers
while those are being accessed from autosuspend_check() and possibly by
Explicitly disable autosuspend in usb_disconnect() before calling
usb_disable_device(). This doesn't fix the race for good, but it ensures
that the pm runtime worker doesn't call usb_runtime_idle() on the interface
that is being removed, and thus avoids the race in the affected code path.
Signed-off-by: Guenter Roeck <linux@xxxxxxxxxxxx>
This is another interesting situation. As mentioned above, the patch doesn't
really fix the race problem. On the other side, fixing it for good would
(probably) be much more complex. I still see the race after applying this
patch, but it happens maybe once a day vs. several times per hour.
Marked as RFC in the hope that someone has an idea for a better fix.
I tried clearing udev->actconfig prior to removing the interfaces
in usb_disable_device(), but that alone didn't help; it does not
resolve the race condition either, and still results in the crash.
The only clean solution I can think of would be to protect accesses
to dev->actconfig with a spinlock or mutex, and to make sure that the
lock is held during read accesses and that dev->actconfig is cleared
before releasing the lock on write accesses. I'll be happy to do that
if it is the way to go, but I would like some feedback before I give it
drivers/usb/core/hub.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 5286bf67869a..5a420657f9f7 100644
@@ -2093,6 +2093,15 @@ void usb_disconnect(struct usb_device **pdev)
* so that the hardware is now fully quiesced.
dev_dbg(&udev->dev, "unregistering device\n");
+ * Disable autosuspend before disabling the device, and make sure
+ * that autosuspend doesn't touch it while it is in the process
+ * of being deleted.