Web lists-archives.com

[Samba] Need help troubleshooting TCP thrashing, possible kernel bug?




I have a FreeNAS 9.3 server running Samba Version 4.3.6 and a bunch of
Windows and Linux clients. Everything's been running fine for a while and
nothing changed on the server.

Recently (Jan 27th) some of the Archlinux clients updated from a 4.8.x
kernel to a 4.9.x kernel. Again, things ran fine. Then on Jan 30th around
2am the Archlinux clients using 4.9.x kernels and utilizing mount.cifs to
access samba shares began thrashing on TCP port 445, causing high CPU load
on the server. These machines now cause thrashing after 15-20 minutes
whenever a share is mounted using mount.cifs.

When it's thrashing, I see thousands of opened ports from a single client:
# sockstat -4 | grep 10.0.1.87 | wc
   10013   70091  740962

And on the client, the port is constant changing:
$ netstat -net | grep 10.0.0.8
tcp        0      0 10.0.1.87:53122         10.0.0.8:445
 ESTABLISHED 0          1253359
$ netstat -net | grep 10.0.0.8
tcp        0      0 10.0.1.87:53700         10.0.0.8:445
 ESTABLISHED 0          1253439
$ netstat -net | grep 10.0.0.8
tcp        0      0 10.0.1.87:53926         10.0.0.8:445
 ESTABLISHED 0          1254557
$ netstat -net | grep 10.0.0.8
tcp        0      0 10.0.1.87:54148         10.0.0.8:445
 ESTABLISHED 0          1253578
$ netstat -net | grep 10.0.0.8
tcp        0      0 10.0.1.87:54352         10.0.0.8:445
 ESTABLISHED 0          1253604
$ netstat -net | grep 10.0.0.8
tcp        0      0 10.0.1.87:54518         10.0.0.8:445
 ESTABLISHED 0          1254685
$ netstat -net | grep 10.0.0.8
tcp        0      0 10.0.1.87:54698         10.0.0.8:445
 ESTABLISHED 0          1252177

As a work around, I can downgrade these client machines to any 4.8.x kernel
and the issue goes away. My suspicion is something is weird in my smb.conf
and a change in the 4.9.x kernels exposes that weirdness. Or maybe there's
a bug that was introduced in 4.9 and our setup exposes it.

I've built 4.10rc kernels from Linus's git repo and they also have the
problem. The 4.9 kernel I built from Linus's git has the problem, but the
4.8 kernel I built does not, so I don't think it's related to any patching
done by Archlinux. I don't understand why the issue didn't happen
immediately after upgrading kernels on the 27th, but now it very
consistently acts up after less than 20 minutes.

Attached is the smb.conf used on one of my FreeNAS servers. I was able to
copy that config to an Archlinux system running Samba version 4.5.3
(commenting lines 24, 25, 55, and 79 and adjusting the "interfaces =" line)
and the problem persists, so it doesn't appear to be specific to FreeNas or
Samba 4.3.6.

--
Paul Klapperich
-- 
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba