Web lists-archives.com

[Samba] samba getting stuck, highwatermark replication issue?




Hi all,

We would appreciate some input here. Not sure where to look...

We have three AD DCs, all running samba 4.5.10, and since a few days, the samba DCs are getting stuck regularly, at ramdon times. Happens to all three of them, randomly, and currently it is happening up to a few times per day..! Must be some common cause.

For the rest, the systems appear fine, enough diskspace, nothing special in syslog, etc.

We usually detect that a DC has become stuck, because LDAP auth no longer works in that DC. Checking with "service sernet-samba-ad status" will still report "Running".

After shutting down samba ("service sernet-samba-ad stop") one process usually is still running, and prevents a restart from succeeding, always because:

Failed to listen on 0.0.0.0:135 - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED

ps aux tells me that the process is: "samba -D"

Killing that process makes samba startup succeed, replication work again, and samba funcion, until the next time this happens.

But WHY is samba getting stuck in the first place?

We are getting the following unusual in the logs on all three DCs:
  ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=a_username,CN=Users,DC=samba,DC=company,DC=com)
  ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
  ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
and the last line keeps repeating 2 - 3 times per second, completely filling up the logs. The start-off username differs per DC, but on each DC it usually remains the same. (I have seen 5 or 6 different usernames in total)

samba-tool dbcheck --cross-ncs looks similar on all three DCs, with *many* errors about unsorted attributes, that I think I've been told in the past are harmless:
CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x0002000d
CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x00020002
CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x00020001
CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x0000000d
CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x00000003
CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x00000000
ERROR: unsorted attributeID values in replPropertyMetaData on CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com

Not fixing replPropertyMetaData on CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com

Please use --fix to fix these errors
Checked 4948 objects (4193 errors)

All 4948 errors are about unsorted attributeID, with the following exception: There appear still some references to an old (many YEARS ago removed) DC:
ERROR: no target object found for GUID component for msDS-NC-Replica-Locations in object CN=84bea0a7-82dd-4237-9296-030573700698,CN=Partitions,CN=Configuration,DC=samba,DC=company,DC=com - <GUID=81a27497-bdfb-4977-9874-675bbfba490f>;<RMD_ADDTIME=130405075610000000>;<RMD_CHANGETIME=130405075610000000>;<RMD_FLAGS=0>;<RMD_INVOCID=556b2cb4-e576-48e2-bb7c-7f62caee84fc>;<RMD_LOCAL_USN=187541>;<RMD_ORIGINATING_USN=3630>;<RMD_VERSION=0>;CN=NTDS Settings,CN=DC1,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=samba,DC=company,DC=com
ERROR: no target object found for GUID component for msDS-NC-Replica-Locations in object CN=d9d76e21-8cae-457d-b212-6cb192612739,CN=Partitions,CN=Configuration,DC=samba,DC=company,DC=com - <GUID=81a27497-bdfb-4977-9874-675bbfba490f>;<RMD_ADDTIME=130405075610000000>;<RMD_CHANGETIME=130405075610000000>;<RMD_FLAGS=0>;<RMD_INVOCID=556b2cb4-e576-48e2-bb7c-7f62caee84fc>;<RMD_LOCAL_USN=187515>;<RMD_ORIGINATING_USN=3631>;<RMD_VERSION=0>;CN=NTDS Settings,CN=DC1,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=samba,DC=company,DC=com

That's about all info I can gather.

The very basic smb.conf on the DCs::

[global]
	workgroup = WRKGRP
	realm = samba.company.com
	netbios name = DC4
	server role = active directory domain controller
	log level = 3
	dns forwarder = 192.x.x.x
	server signing = mandatory
	ntlm auth = yes
	ldap server require strong auth = no
	idmap_ldb:use rfc2307 = yes

[netlogon]
	path = /var/lib/samba/sysvol/samba.company.com/scripts
	read only = No

[sysvol]
	path = /var/lib/samba/sysvol
	read only = No
	acl_xattr:ignore system acls = yes

We have been running 4.5.10 since may 2017, and this issue started this week.

Anyone with an idea?

--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba