Re: [Samba] Replication Failure Issue

On 3/25/2018 8:54 PM, David Minard wrote:

On 24/03/18 01:35, lingpanda101 wrote:
On 3/22/2018 8:06 PM, David Minard wrote:
G'day All,

    Will replay to all messages so far in this one to keep it all together.

On 21/03/18 22:52, lingpanda101 wrote:
On 3/21/2018 7:32 AM, David Minard via samba wrote:
Thanks Carlos,

The thing is, that I did not upgrade the version of Samba - that is the next step, so the ports used would not have changed. I only updated the OS.

On 21/03/2018, at 10:04 PM, Carlos Alberto Panozzo Cunha <carlos.hollow@xxxxxxxxx> wrote:

I have same problem after update for samba.
I allow new ports in firewall.



On Wed, Mar 21, 2018, 00:15 David Minard via samba <samba@xxxxxxxxxxxxxxx> wrote:
G'day All,

         I have 4 DCs on Centos 7.1. Everything was working really well for
years, including replication.

         Then I decided that the OS needed updating. Did the yum update on one
of the DCs, rebooted. That server is now running Centos 7.4. Samba
seemed to start okay.

         However, samba-tool drs showrepl gives this error on all 3 of the other
DCs, and shows success on the updated DC.

         Default-First-Site-Name\SAMBA4-10 via RPC
                 DSA object GUID: 7fa7fc88-8d99-4217-b329-7e82324ec084

                 Last attempt @ Wed Mar 21 12:58:13 2018 AEDT failed, result 58

                 10623 consecutive failure(s).
                 Last success @ Thu Mar  8 14:34:14 2018 AEDT

         Any thoughts on why this DC is now not replicating properly? Any
thoughts on how to remedy this?

You most likely will need to turn up the samba log level to get additional information but you can start with running 'yum history list all' and post results. This might help identify the changes that were made to the OS. Are you using bind or the internal DNS?

I will turn up the logs and test it out.

I use Bind-9.9.4-51 (before update 9.9.4-18)

yum history shows 348 packages that got updated... Bind being one. Will sift through them.

My firewall is very lose. All ports are open for the subnets on which the samba servers need to talk. eg:

-A INPUT -s -p tcp -m state --state NEW -m tcp -j ACCEPT
-A INPUT -s -p udp -m state --state NEW -m udp -j ACCEPT

When I first set this up with 4.0.0-a2 (or whatever it was right at the beginning), I was not able to work out what ports exactly were needed, hence the lose rules. Now I see they are documented clearly on the Samba site, I will tighten them up, but not until the issue is resolved.

My samba is complied from source. I am currently running 4.3.2. It's been running flawlessly so no urgency to update, until the huge security hole was announced the other week. Now I've got to get it done, but want the ailing server going right first - or should I just do the updates and then worry about the ailing server?


# Global parameters
    workgroup = SCEM_AD
    realm = samba4.scem.westernsydney.edu.au
    netbios name = SAMBA4-10
    server role = active directory domain controller
    server services = s3fs, rpc, nbt, wrepl, ldap, cldap, kdc, drepl, winbindd, ntp_signd, kcc, dnsupdate

#        log level = 1 auth:2
# logs split per machine
        log file = /var/log/samba/log.%m
        # max 50KB per log file, then rotate
        max log size = 0

    path = /usr/local/samba/var/locks/sysvol/samba4.scem.westernsydney.edu.au/scripts
    read only = No

    path = /usr/local/samba/var/locks/sysvol
    read only = No

It is the out of the box config from the original provision.

I myself would hold off updating until you correct the DC's with the issues. Anything in the Samba logs or yum history stand out? You can try and force replication 'samba-tool drs replicate --full-sync' from FirstDC to SecondDC.

The first thing I tried, was the forced replication on NC that was unhappy:

# samba-tool drs replicate Broken-DC Working-DC DC=DomainDnsZones,DC=samba4,DC=scem,DC=westernsydney,DC=edu,DC=au --full-sync
Replicate from Working-DC to Broken-DC was successful.

Then doing the showrepl on all DCs, everything seemed fine.

I held off sending this message for a couple of hours, and things are now showing up as broken again. I now have two DCs with the same issue, because I accidentally got the direction of the sync wrong. I went source destination, rather than destination source. I should read the help a bit better!

Anyway, this shows that manual replication seems successful, and that it might not be a firewall thing, as the second DC that now has the issue has not been updated in any way, shape, or form.

Now the strangest thing is that the two broken-DCs report that everything is fine between them when I showrepl. From the working-DCs, they show the two broken-DCs up.

Before you try anything further I would suggest you make a good backup of your current DC not exhibiting any replication issues.


Have you tried correcting the force replication with a known good DC?

You can try to further troubleshoot the issues and attempt to resolve, but the easiest thing IMO would be to join new DC's to the domain. Remove the other two DC's from the domain and never join them again.


