Web lists-archives.com

[Samba] samba-ad restart fails occasionally




Hi,

We are running sernet samba, and on one particular DC (debian 7.11, samba 4.5.6), when logrotate is ready rotating, "sernet-samba-ad restart" fails with:

Shutting down SAMBA AD services : ...trying once more ... (warning).
...trying once more ... (warning).
.....
...trying once more ... (warning).
Error: /usr/sbin/samba still running with PID=14755 from /var/run/samba/samba.pid ... failed!
Starting SAMBA AD services : Warning: /usr/sbin/samba already running ! ... (warning).

So after all the script tries to start samba again, but since there appeared to be a process remaining, it fails to actually start, and this causes all kinds of failures:

- Replication fails with (WERR_CONNECTION_REFUSED)
- LDAP queries to that dc fail with query error: Transport endpoint is not connected

Rebooting the DC solves everything, but scheduling a reboot every morning is not an elegant solution. :-)

Here is the part of the sernet-samba-ad script where the restart occurs:

	PID=$(cat ${PIDFILE})

	if ! (readlink /proc/${PID}/exe | grep -q "^${BINARY}") ; then
		log_warning_msg "Warning: ${BINARY} not running with PID=${PID} from ${PIDFILE} ! "
		exit 0
	fi

	kill -15 ${PID}
	for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30; do
		sleep 3
		kill -0 ${PID} >/dev/null 2>&1 || break
		log_warning_msg "...trying once more "
		kill -15 ${PID}
	done

	kill -0 ${PID} >/dev/null 2>&1 || {
		log_success_msg ""
		rm -f ${PIDFILE}
		exit 0
	}

Hw dangerous would it be to replace the last "kill -15 ${PID}" with "kill -9 ${PID}"?

Does anyone have some nice suggestions how to solve this?

MJ

--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba