Web lists-archives.com

Re: [Samba] samba getting stuck, highwatermark replication issue?




Hi James, list

We really appreciate your input on this, thanks!

On 10/12/2017 04:12 PM, lingpanda101 via samba wrote:
MJ,

    A dev or someone else may to assist but your replication isn't syncing correctly among each other.  Those dangling links should have purged by now if it's in reference to a DC removed several years ago.

This is rather worrying :-|

Specially since I have all kinds of scripts in place that continously check replication, hourly using "samba-tool drs showrepl" plus "samba-tool ldapcmp" every other hour.

So one can even have problems, when all built-in checks succeed. :-(

Currently DC2 has high cpu usage, and grepping the log.samba for "Succeeded" gives this kind of result:

  Replicated 0 objects (0 linked attributes) for DC=ForestDnsZones,DC=samba,DC=company,DC=com
  Replicated 0 objects (0 linked attributes) for DC=ForestDnsZones,DC=samba,DC=company,DC=com
  Replicated 0 objects (0 linked attributes) for CN=Configuration,DC=samba,DC=company,DC=com
  Replicated 0 objects (0 linked attributes) for CN=Configuration,DC=samba,DC=company,DC=com
  Replicated 0 objects (0 linked attributes) for DC=samba,DC=company,DC=com
  Replicated 0 objects (0 linked attributes) for DC=samba,DC=company,DC=com
  Replicated 0 objects (0 linked attributes) for CN=Schema,CN=Configuration,DC=samba,DC=company,DC=com
  Replicated 0 objects (0 linked attributes) for CN=Schema,CN=Configuration,DC=samba,DC=company,DC=com
  Replicated 3 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
  Replicated 0 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
  Replicated 0 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
  Replicated 0 objects (0 linked attributes) for DC=ForestDnsZones,DC=samba,DC=company,DC=com
  Replicated 0 objects (0 linked attributes) for DC=ForestDnsZones,DC=samba,DC=company,DC=com
  Replicated 0 objects (0 linked attributes) for CN=Configuration,DC=samba,DC=company,DC=com

All zero, with some exceptions...

I image this looks better, a sample from the non-high CPU DCs:
  Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
  Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
  Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
  Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
  Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
  Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
  Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
  Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
  Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
  Replicated 0 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
  Replicated 0 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
  Replicated 0 objects (0 linked attributes) for DC=ForestDnsZones,DC=samba,DC=company,DC=com
  Replicated 0 objects (0 linked attributes) for DC=ForestDnsZones,DC=samba,DC=company,DC=com
  Replicated 0 objects (0 linked attributes) for CN=Configuration,DC=samba,DC=company,DC=com
  Replicated 0 objects (0 linked attributes) for CN=Configuration,DC=samba,DC=company,DC=com
  Replicated 0 objects (0 linked attributes) for DC=samba,DC=company,DC=com
  Replicated 0 objects (0 linked attributes) for DC=samba,DC=company,DC=com
  Replicated 0 objects (0 linked attributes) for CN=Schema,CN=Configuration,DC=samba,DC=company,DC=com
  Replicated 0 objects (0 linked attributes) for CN=Schema,CN=Configuration,DC=samba,DC=company,DC=com
  Replicated 2 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
  Replicated 2 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
  Replicated 4 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
  Replicated 4 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
  Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
  Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
  Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
  Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
  Replicated 2 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
  Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
  Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
  Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com
  Replicated 1 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com

Some zeros, but many indications that it is actually replicating data.

Did you do a full replication from a known good DC to the other two?
Well at this point I have no idea which DC I can consider "a good dc".

This doesn't always fix the issue but is a good start. You didn't by chance restore a DC recently from backup or had one offline and recently powered on?
No. These three DCs have been online for many years, ever since the DC1 was removed. (we never demoted it, since it had crashed, so we manually removed the DC1 from the database, that's perhaps why there are some remains)

The fact that there are still two 'dangling forward links', identical on all DCs, makes me think that we simply have missed those when we manually removed all DC1 references. This happened back in the samba 4.1 days.

The highwatermark value tells the source DC what objects the destination DC is requesting to update. The high CPU usage seems due to the DC doing a full partition replication. The fact you stated this issue can happen on all 3 makes it ever tougher to help. I would normally advise to just demote the affected DC and join again.

Perhaps I should try if I can find a combination of two DCs that works, check replication, verify with ldapcmp, make sure no high cpu, etc, etc, and then trust those two and demote the third.

Any input here would be very welcome... Here's bit of the logs, leading up to the "Replicated 0 objects" on the current high-cpu DC, hopefully that reveils something..?

  Not authoritative for '_kerberos.com', forwarding
[2017/10/12 06:00:16.744615,  2] ../source4/dns_server/dns_query.c:1019(dns_server_process_query_send)
  Not authoritative for '_kerberos.com', forwarding
[2017/10/12 06:00:16.745393,  2] ../source4/dns_server/dns_query.c:1019(dns_server_process_query_send)
  Not authoritative for '_kerberos.com', forwarding
[2017/10/12 06:00:16.745731,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
  Kerberos: AS-REQ authtime: 2017-10-12T06:00:16 starttime: unset endtime: 2017-10-12T16:00:16 renew till: unset
[2017/10/12 06:00:16.745830,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
  Kerberos: Client supported enctypes: aes256-cts-hmac-sha1-96, aes128-cts-hmac-sha1-96, des3-cbc-sha1, des3-cbc-md5, arcfour-hmac-md5, using arcfour-hmac-md5/arcfour-hmac-md5
[2017/10/12 06:00:16.745975,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
  Kerberos: Requested flags: forwardable
[2017/10/12 06:00:16.748679,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
  Kerberos: TGS-REQ MEMBERSERVER$@SAMBA.COMPANY.COM from ipv4:192.168.89.2:40725 for ldap/dc2.SAMBA.COMPANY.COM@xxxxxxxxxxxxxxxxx [canonicalize]
[2017/10/12 06:00:16.754551,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
  Kerberos: TGS-REQ authtime: 2017-10-12T06:00:16 starttime: 2017-10-12T06:00:16 endtime: 2017-10-12T16:00:16 renew till: unset
[2017/10/12 06:00:16.755962,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
  Kerberos: TGS-REQ DC2$@SAMBA.COMPANY.COM from ipv4:192.87.143.15:41634 for ldap/DC2.SAMBA.COMPANY.COM@xxxxxxxxxxxxxxxxx [canonicalize]
[2017/10/12 06:00:16.762012,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
  Kerberos: TGS-REQ authtime: 2017-10-12T06:00:16 starttime: 2017-10-12T06:00:16 endtime: 2017-10-12T16:00:16 renew till: unset
[2017/10/12 06:00:16.762249,  3] ../source4/smbd/service_stream.c:66(stream_terminate_connection)
  Terminating connection - 'kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED'
[2017/10/12 06:00:16.762249,  3] ../source4/smbd/service_stream.c:66(stream_terminate_connection)
  Terminating connection - 'kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED'
[2017/10/12 06:00:16.762320,  3] ../source4/smbd/process_single.c:114(single_terminate)
  single_terminate: reason[kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED]
[2017/10/12 06:00:16.762967,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
  Kerberos: TGS-REQ MEMBERSERVER$@SAMBA.COMPANY.COM from ipv4:192.168.89.2:40726 for krbtgt/SAMBA.COMPANY.COM@xxxxxxxxxxxxxxxxx [forwarded, forwardable]
[2017/10/12 06:00:16.765363,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
  Kerberos: TGS-REQ authtime: 2017-10-12T06:00:16 starttime: 2017-10-12T06:00:16 endtime: 2017-10-12T16:00:16 renew till: unset
[2017/10/12 06:00:16.765585,  3] ../source4/smbd/service_stream.c:66(stream_terminate_connection)
  Terminating connection - 'kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED'
[2017/10/12 06:00:16.765679,  3] ../source4/smbd/process_single.c:114(single_terminate)
  single_terminate: reason[kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED]
[2017/10/12 06:00:16.766324,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
  Kerberos: TGS-REQ DC2$@SAMBA.COMPANY.COM from ipv4:192.87.143.15:41635 for krbtgt/SAMBA.COMPANY.COM@xxxxxxxxxxxxxxxxx [forwarded, forwardable]
[2017/10/12 06:00:16.768612,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
  Kerberos: TGS-REQ authtime: 2017-10-12T06:00:16 starttime: 2017-10-12T06:00:16 endtime: 2017-10-12T16:00:16 renew till: unset
[2017/10/12 06:00:16.768836,  3] ../source4/smbd/service_stream.c:66(stream_terminate_connection)
  Terminating connection - 'kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED'
[2017/10/12 06:00:16.768907,  3] ../source4/smbd/process_single.c:114(single_terminate)
  single_terminate: reason[kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED]
[2017/10/12 06:00:16.769475,  3] ../source4/smbd/service_stream.c:66(stream_terminate_connection)
  Terminating connection - 'kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED'
[2017/10/12 06:00:16.769542,  3] ../source4/smbd/process_single.c:114(single_terminate)
  single_terminate: reason[kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED]
[2017/10/12 06:00:16.799101,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
  Kerberos: TGS-REQ DC2$@SAMBA.COMPANY.COM from ipv4:192.87.143.15:41637 for ldap/dc2.SAMBA.COMPANY.COM@xxxxxxxxxxxxxxxxx [canonicalize]
[2017/10/12 06:00:16.808786,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
  Kerberos: TGS-REQ authtime: 2017-10-12T06:00:16 starttime: 2017-10-12T06:00:16 endtime: 2017-10-12T16:00:16 renew till: unset
[2017/10/12 06:00:16.809681,  3] ../source4/smbd/service_stream.c:66(stream_terminate_connection)
  Terminating connection - 'kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED'
[2017/10/12 06:00:16.809767,  3] ../source4/smbd/process_single.c:114(single_terminate)
  single_terminate: reason[kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED]
[2017/10/12 06:00:16.817237,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
  Kerberos: TGS-REQ DC2$@SAMBA.COMPANY.COM from ipv4:192.87.143.15:41638 for krbtgt/SAMBA.COMPANY.COM@xxxxxxxxxxxxxxxxx [forwarded, forwardable]
[2017/10/12 06:00:16.819573,  3] ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
  Kerberos: TGS-REQ authtime: 2017-10-12T06:00:16 starttime: 2017-10-12T06:00:16 endtime: 2017-10-12T16:00:16 renew till: unset
[2017/10/12 06:00:16.820289,  3] ../source4/smbd/service_stream.c:66(stream_terminate_connection)
  Terminating connection - 'kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED'
[2017/10/12 06:00:16.820368,  3] ../source4/smbd/process_single.c:114(single_terminate)
  single_terminate: reason[kdc_tcp_call_loop: tstream_read_pdu_blob_recv() - NT_STATUS_CONNECTION_DISCONNECTED]
[2017/10/12 06:00:16.843259,  2] ../source4/dsdb/repl/replicated_objects.c:1016(dsdb_replicated_objects_commit)
  Replicated 0 objects (0 linked attributes) for DC=DomainDnsZones,DC=samba,DC=company,DC=com

Lot's of NT_STATUS_CONNECTION_DISCONNECTED. Ideas anyone..?

MJ

--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba