Web lists-archives.com

Re: [Samba] samba getting stuck, highwatermark replication issue?




On 10/9/2017 1:28 PM, mj via samba wrote:
Hi all,

We would appreciate some input here. Not sure where to look...

We have three AD DCs, all running samba 4.5.10, and since a few days, the samba DCs are getting stuck regularly, at ramdon times. Happens to all three of them, randomly, and currently it is happening up to a few times per day..! Must be some common cause.

For the rest, the systems appear fine, enough diskspace, nothing special in syslog, etc.

We usually detect that a DC has become stuck, because LDAP auth no longer works in that DC. Checking with "service sernet-samba-ad status" will still report "Running".

After shutting down samba ("service sernet-samba-ad stop") one process usually is still running, and prevents a restart from succeeding, always because:

Failed to listen on 0.0.0.0:135 - NT_STATUS_ADDRESS_ALREADY_ASSOCIATED

ps aux tells me that the process is: "samba -D"

Killing that process makes samba startup succeed, replication work again, and samba funcion, until the next time this happens.

But WHY is samba getting stuck in the first place?

We are getting the following unusual in the logs on all three DCs:
../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=a_username,CN=Users,DC=samba,DC=company,DC=com)   ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)   ../source4/rpc_server/drsuapi/getncchanges.c:1961: DsGetNCChanges 2nd replication on DN DC=samba,DC=company,DC=com older highwatermark (last_dn CN=Schema Admins,CN=Users,DC=samba,DC=company,DC=com)
and the last line keeps repeating 2 - 3 times per second, completely filling up the logs. The start-off username  differs per DC, but on each DC it usually remains the same. (I have seen 5 or 6 different usernames in total)

samba-tool dbcheck --cross-ncs looks similar on all three DCs, with *many* errors about unsorted attributes, that I think I've been told in the past are harmless:
CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x0002000d
CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x00020002
CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x00020001
CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x0000000d
CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x00000003
CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com: 0x00000000
ERROR: unsorted attributeID values in replPropertyMetaData on CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com

Not fixing replPropertyMetaData on CN=ykqr002614,CN=Computers,DC=samba,DC=company,DC=com

Please use --fix to fix these errors
Checked 4948 objects (4193 errors)

All 4948 errors are about unsorted attributeID, with the following exception: There appear still some references to an old (many YEARS ago removed) DC:
ERROR: no target object found for GUID component for msDS-NC-Replica-Locations in object CN=84bea0a7-82dd-4237-9296-030573700698,CN=Partitions,CN=Configuration,DC=samba,DC=company,DC=com - <GUID=81a27497-bdfb-4977-9874-675bbfba490f>;<RMD_ADDTIME=130405075610000000>;<RMD_CHANGETIME=130405075610000000>;<RMD_FLAGS=0>;<RMD_INVOCID=556b2cb4-e576-48e2-bb7c-7f62caee84fc>;<RMD_LOCAL_USN=187541>;<RMD_ORIGINATING_USN=3630>;<RMD_VERSION=0>;CN=NTDS Settings,CN=DC1,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=samba,DC=company,DC=com ERROR: no target object found for GUID component for msDS-NC-Replica-Locations in object CN=d9d76e21-8cae-457d-b212-6cb192612739,CN=Partitions,CN=Configuration,DC=samba,DC=company,DC=com - <GUID=81a27497-bdfb-4977-9874-675bbfba490f>;<RMD_ADDTIME=130405075610000000>;<RMD_CHANGETIME=130405075610000000>;<RMD_FLAGS=0>;<RMD_INVOCID=556b2cb4-e576-48e2-bb7c-7f62caee84fc>;<RMD_LOCAL_USN=187515>;<RMD_ORIGINATING_USN=3631>;<RMD_VERSION=0>;CN=NTDS Settings,CN=DC1,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=samba,DC=company,DC=com

That's about all info I can gather.

The very basic smb.conf on the DCs::

[global]
    workgroup = WRKGRP
    realm = samba.company.com
    netbios name = DC4
    server role = active directory domain controller
    log level = 3
    dns forwarder = 192.x.x.x
    server signing = mandatory
    ntlm auth = yes
    ldap server require strong auth = no
    idmap_ldb:use rfc2307 = yes

[netlogon]
    path = /var/lib/samba/sysvol/samba.company.com/scripts
    read only = No

[sysvol]
    path = /var/lib/samba/sysvol
    read only = No
    acl_xattr:ignore system acls = yes

We have been running 4.5.10 since may 2017, and this issue started this week.

Anyone with an idea?

You should be able to fix the 'replPropertyMetaData' errors with;

samba-tool dbcheck --cross-ncs --fix --yes 'fix_replmetadata_unsorted_attid'

The highwatermark doesn't necessarily reflect an issue. It's part of how the destination DC keeps track of changes from the source DC. Can you verify the time and date is correct on all DC's?

The GUID errors seem related to your old DC offline and NTDS connections still lingering.  Open Microsoft Sites and Services and remove the ones no longer needed.



--
--
James


--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba