Web lists-archives.com

[Samba] samba AD database suspected corruption




Hi,

Back in the samba 4.1 days, we experienced a samba database corruption: tombstones not being deleted from sam.lbd, ultimately resulting in a huge database, full root disk, samba crashing, we were completely down. We asked the great guys at sernet to help, they did super work, and managed to get us up and running again, including the addition of a fresh DC4.

Currently on 4.5.15, we have some strange issues with our samba AD setup, that I feel are remains from these old problems. Specifically: - we cannot transfer fsmo roles between DCs due to LDAP error 50 infufficient access rights - have have high cpu usage acress the DCs, combined with continuous "highwatermark" errors on the same DC
- occasionally (2, 3 times a week) the DCs lockup, get stuck

Having said that, I think I found a way out, but would appreciate some feedback from the experts here.

In an isolated test setup, I started a clone of DC2/DC3/DC4, verified that replication is working correctly, ldapcmp as well, etc. Then I added a new DC5. DC2 (fsmo roles owner) did not pick it up at all, DC3 picked it up with WERR_DS_DRA_ACCESS_DENIED, and only DC4 picked it up nicely. So, rolled back, shutdown DC2, and seized fsmo roles on DC4, and added a new samba 4.7 DC5. DC4 picked it up nicely again.

DC3 still WERR_DS_DRA_ACCESS_DENIED, so I shutdown DC3 as well, and focussed on just DC4 (samba 4.5.15) and DC5 (samba 4.7). In my isolated test setup this seems to work nicely: I could logon to a domain member server, a regular win7 workstation logon works, ADUC, Ms DNS manager works, etc, etc. Replication works, ldapcmp confirms, so this looks quite good. DNS is correctly updated to the new situation.

However, I have some questions I'd like to ask, before proceeding.

GPO - I think I have to take idmap.ldb from the old DC4, copy it to DC5, setup SysVol rsync to DC5 as well, restart samba, and do samba-tool ntacl sysvolreset ONCE, and never though it again, right? (asking because the DC4 was NOT our old fsmo roles owner, and 'primary GPO DC')

- Can I re-use the old dns/ip for DC1 / DC2 and DC3? (I ran samba-tool domain demote --remove-other-dead-server=DC1/DC2/DC3 on both remaining DCs) Is this safe to do?

Also, upgrading the remaining samba 4.5.15 DC4 to samba 4.7 causes showrepl to become EXTREMELY slow on that DC.

After upgrading to 4.7, showrepl still works on DC5, also ADUC works to both, on DC4 ldapcmp still works quickly, only samba-tool drs showrepl on the upgraded 4.7 DC4 becomes slooow (10, 15 minutes))

A level 10 debug logs tells me that it waits *MANY* minutes after:
kinit for DC4$@SAMBA.DOMAIN.COM succeeded
and also many minutes after:
GSSAPI credentials for DC4$@SAMBA.DOMAIN.COM will expire in 35664 secs

In the end it does produce the expected output that replication is working.

I have a full -d10 log available if anyone would like to see it

If I cannot get the DC4 to upgrade to 4.7, I could of course also expire that one TOO, and proceed with only a new DC5. But it would be nicer to keep the DC4.

So, all in all this has taken up a lot of my time lately, I am very happy that my production environment dc2/dc3/dc4 is still running, even if with the occasional lockup...

Anyway, all feedback is welcome, including tips, suggestions, different approaches, etc, etc. This is all done just in a test environment...

Please, suggestions? More info?

MJ

--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba