Web lists-archives.com

[Samba] "wbinfo -u" considered harmful towards Winbindd...




Our setup:
Windows AD realm with ~115K users (and numerous groups etc)
FreeBSD servers with Samba 4.7.6 and Samba 4.9.3 (both show the same growth)

We just noticed that one of the ‘winbindd’ daemons on the servers seems to be growing and growing forever. A bit of detective work pointed us at the “wbinfo -u” command being that culprit. As part of a systems monitoring script we ran that once a minute (now disabled) in order to see if all AD users were detected, but somehow that seems to fail sometime and also cause the Winbindd daemon to grow around 455MB per hour… the memory used is not a huge problem on the production servers (they have 256GB RAM) so we didn’t notice this at first (since we restart smbd&winbindd every morning at 7am) - but an old test server with much less RAM ran out of memory around 4:30am… :-)

This is what the process growth look like with a cron job running “wbinfo -u” once a minute:

> 10:50/ps-auxwww.log:root      7658    0.0  0.7 1999680 1861712  -  S    07:00       21:54.66 winbindd: domain child [AD] (winbindd)
> 11:00/ps-auxwww.log:root      7658    0.0  0.7 2077504 1943248  -  I    07:00       22:48.07 winbindd: domain child [AD] (winbindd)
> 11:10/ps-auxwww.log:root      7658    0.0  0.8 2157376 2022720  -  S    07:00       23:41.83 winbindd: domain child [AD] (winbindd)
> 11:20/ps-auxwww.log:root      7658   95.3  0.8 2218816 2063896  -  R    07:00       24:26.61 winbindd: domain child [AD] (winbindd)
> 11:30/ps-auxwww.log:root      7658    0.0  0.8 2243392 2107712  -  I    07:00       24:39.43 winbindd: domain child [AD] (winbindd)
> 11:40/ps-auxwww.log:root      7658    0.0  0.8 2321216 2185384  -  S    07:00       25:35.37 winbindd: domain child [AD] (winbindd)
> 11:50/ps-auxwww.log:root      7658    0.0  0.9 2452288 2317168  -  S    07:00       27:21.18 winbindd: domain child [AD] (winbindd)
> 12:00/ps-auxwww.log:root      7658    0.0  0.9 2534208 2399384  -  S    07:00       28:15.81 winbindd: domain child [AD] (winbindd)
> 12:10/ps-auxwww.log:root      7658   96.4  0.9 2587456 2433792  -  R    07:00       29:52.67 winbindd: domain child [AD] (winbindd)
> 12:20/ps-auxwww.log:root      7658    0.0  0.9 2609984 2474460  -  S    07:00       30:07.13 winbindd: domain child [AD] (winbindd)
> 12:30/ps-auxwww.log:root      7658    0.0  1.0 2736960 2672352  -  S    07:00       32:03.49 winbindd: domain child [AD] (winbindd)
> 12:40/ps-auxwww.log:root      7658    0.0  1.0 2736960 2672744  -  S    07:00       32:03.51 winbindd: domain child [AD] (winbindd)


This is a listing of the user listings from “wbinfo -u” during that same time:

> -rw-r--r--  1 root  wheel        0 Dec  9 10:48 10:48/wbinfo-u.log
> -rw-r--r--  1 root  wheel  1062356 Dec  9 10:49 10:49/wbinfo-u.log
> -rw-r--r--  1 root  wheel  1062356 Dec  9 10:50 10:50/wbinfo-u.log
> -rw-r--r--  1 root  wheel  1062356 Dec  9 10:51 10:51/wbinfo-u.log
> -rw-r--r--  1 root  wheel  1062356 Dec  9 10:52 10:52/wbinfo-u.log
> -rw-r--r--  1 root  wheel  1062356 Dec  9 10:53 10:53/wbinfo-u.log
> -rw-r--r--  1 root  wheel  1062356 Dec  9 10:54 10:54/wbinfo-u.log
> -rw-r--r--  1 root  wheel  1062356 Dec  9 10:55 10:55/wbinfo-u.log
> -rw-r--r--  1 root  wheel  1062356 Dec  9 10:56 10:56/wbinfo-u.log
> -rw-r--r--  1 root  wheel               0 Dec  9 10:57 10:57/wbinfo-u.log
> -rw-r--r--  1 root  wheel               0 Dec  9 10:58 10:58/wbinfo-u.log
> -rw-r--r--  1 root  wheel  1062356 Dec  9 10:59 10:59/wbinfo-u.log
> -rw-r--r--  1 root  wheel  1062356 Dec  9 11:00 11:00/wbinfo-u.log
> -rw-r--r--  1 root  wheel  1062356 Dec  9 11:01 11:01/wbinfo-u.log
> -rw-r--r--  1 root  wheel  1062356 Dec  9 11:02 11:02/wbinfo-u.log
> -rw-r--r--  1 root  wheel  1062356 Dec  9 11:03 11:03/wbinfo-u.log
> -rw-r--r--  1 root  wheel  1062356 Dec  9 11:04 11:04/wbinfo-u.log
> -rw-r--r--  1 root  wheel  1062356 Dec  9 11:05 11:05/wbinfo-u.log
> -rw-r--r--  1 root  wheel  1062356 Dec  9 11:06 11:06/wbinfo-u.log
> -rw-r--r--  1 root  wheel  1062356 Dec  9 11:07 11:07/wbinfo-u.log
> -rw-r--r--  1 root  wheel              0 Dec  9 11:08 11:08/wbinfo-u.log
> -rw-r--r--  1 root  wheel              0 Dec  9 11:09 11:09/wbinfo-u.log
> -rw-r--r--  1 root  wheel  1062356 Dec  9 11:10 11:10/wbinfo-u.log
> -rw-r--r--  1 root  wheel  1062356 Dec  9 11:11 11:11/wbinfo-u.log

An attempt I just made at manually running the “wbinfo -u” command gave:

> # ps uxwwww 7658
> USER  PID %CPU %MEM    VSZ    RSS TT  STAT STARTED    TIME COMMAND
> root      7658    0.0  1.0 2736960 2675044  -  S    07:00       32:04.70 winbindd: domain child [AD] (winbindd)

> # /usr/bin/time /liu/bin/wbinfo -u > /tmp/wbinfo-u.out
> Error looking up domain users
>        64.82 real         0.00 user         0.00 sys

> # ls -l /tmp/wbinfo-u.out
> -rw-r--r--  1 root  wheel  0 Dec  9 19:50 /tmp/wbinfo-u.out

> # ps uxwwww 7658
> USER  PID %CPU %MEM    VSZ    RSS TT  STAT STARTED    TIME COMMAND

> root      7658    0.0  1.0 2775872 2715664  -  I    07:00       33:13.21 winbindd: domain child [AD] (winbindd)


I have saved a gcore-generated core dump from one of the big Winbindd:s in case it might be of use in order to pin-point the leak. Here’s a GDB stack trace (doubt that gives much info though):

> Reading symbols from /liu/sbin/winbindd...done.
> [New LWP 102857]
> Core was generated by `winbindd: domain child [AD]'.
> #0  0x0000000809b48948 in ?? () from /lib/libc.so.7
> (gdb) bt
> #0  0x0000000809b48948 in ?? () from /lib/libc.so.7
> #1  0x0000000809bad3c5 in snprintf () from /lib/libc.so.7
> #2  0x0000000806852f56 in namemap_cache_set_sid2name (sid=sid@entry=0x7fffffffccf0, domain=<optimized out>, domain@entry=0x8be412930 "AD", name=name@entry=0x8becdef10 "annha585",
>     type=type@entry=SID_NAME_USER, timeout=1544354765) at ../source3/lib/namemap_cache.c:74
> #3  0x0000000001054ff6 in wcache_save_sid_to_name (domain=domain@entry=0x812669fe0, status=..., sid=sid@entry=0x7fffffffccf0, domain_name=0x8be412930 "AD", name=0x8becdef10 "annha585",
>     type=SID_NAME_USER) at ../source3/winbindd/winbindd_cache.c:969
> #4  0x0000000001059b0f in wb_cache_rids_to_names (domain=domain@entry=0x812669fe0, mem_ctx=0x81261d660, domain_sid=domain_sid@entry=0x8b3e881e0, rids=rids@entry=0x8b800dfa0,
>     num_rids=num_rids@entry=117864, domain_name=domain_name@entry=0x7fffffffcd90, names=0x7fffffffcd98, types=0x7fffffffcda0) at ../source3/winbindd/winbindd_cache.c:2202
> #5  0x000000000107b4d0 in _wbint_LookupRids (p=p@entry=0x7fffffffce30, r=r@entry=0x8a9444080) at ../source3/winbindd/winbindd_dual_srv.c:660
> #6  0x00000000010cf243 in api_wbint_LookupRids (p=0x7fffffffce30) at default/librpc/gen_ndr/srv_winbind.c:1353
> #7  0x000000000107a345 in winbindd_dual_ndrcmd (domain=<optimized out>, state=0x7fffffffd0c8) at ../source3/winbindd/winbindd_dual_ndr.c:369
> #8  0x0000000001076082 in child_process_request (state=0x7fffffffd0c8, child=<optimized out>) at ../source3/winbindd/winbindd_dual.c:745
> #9  child_handler (ev=<optimized out>, fde=<optimized out>, flags=<optimized out>, private_data=0x7fffffffd0c0) at ../source3/winbindd/winbindd_dual.c:1602
> #10 0x000000080984bafd in tevent_common_invoke_fd_handler () from /usr/local/lib/libtevent.so.0
> #11 0x000000080984e85b in ?? () from /usr/local/lib/libtevent.so.0
> #12 0x000000080984ae4e in _tevent_loop_once () from /usr/local/lib/libtevent.so.0
> #13 0x00000000010797de in fork_domain_child (child=0x812681760) at ../source3/winbindd/winbindd_dual.c:1817
> #14 0x0000000001079956 in wb_child_request_waited (subreq=0x0) at ../source3/winbindd/winbindd_dual.c:238
> #15 0x000000080984bf37 in tevent_common_invoke_immediate_handler () from /usr/local/lib/libtevent.so.0
> #16 0x000000080984bf94 in tevent_common_loop_immediate () from /usr/local/lib/libtevent.so.0
> #17 0x000000080984e17c in ?? () from /usr/local/lib/libtevent.so.0
> #18 0x000000080984ae4e in _tevent_loop_once () from /usr/local/lib/libtevent.so.0
> #19 0x000000000104e9c2 in main (argc=<optimized out>, argv=<optimized out>) at ../source3/winbindd/winbindd.c:1911


smb.conf stuff related to Winbindd:

> ; Security type
> security = ADS
> realm = AD.LIU.SE
> workgroup = AD
> 
> ;; ID Mappings
> idmap config * : backend = tdb
> idmap config * : range = 2000000001-2100000000
> idmap config AD : backend = ad
> idmap config AD : range = 1-2000000000
> idmap config AD : schema_mode = rfc2307
> idmap config AD : unix_primary_group = yes


> winbind nested groups = false
> winbind enum users = false
> winbind enum groups = false
> winbind use default domain = yes
> winbind normalize names = yes
> winbind max clients = 1000
> winbind max domain connections = 10
> winbind nss info = template



Let me know if there is something else I can check to pinpoint this leak…

- Peter


-- 
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba