Web lists-archives.com

rsnapshot is Throwing an Intermittent Error.

On most days or nights, this runs perfectly and then there is the following:

From:    root@wb5agz (Cron Daemon)
Subject: Cron <root@wb5agz> /usr/local/etc/daily_backup

/bin/rm: cannot remove '/var/cache/rsnapshot/halfday.1/wb5agz/home/usr/lib/grub
/i386-pc': Transport endpoint is not connected
/bin/rm: cannot remove '/var/cache/rsnapshot/halfday.1/wb5agz/home/usr/lib/libb
ind9.so.80.0.7': Transport endpoint is not connected

This can go on for hundreds of lines and always references a
different set of directories and files.

	There can also be weeks of error-free backups in which
everything just works.

I think I am creating this mess by the way I do the backups since
it smells like some sort of race condition.  The daily backup
script follows:

#Do the halfday backup first.
#The halfday mounts and unmounts the backup media.
/usr/bin/rsnapshot halfday
#Mount backup media first since this next is just a rotation.
mount /rsnapshot1 >/dev/null 2>&1 ||exit 1
mount /rsnapshot2 >/dev/null 2>&1 ||exit 1
    mhddfs /rsnapshot1,/rsnapshot2 /var/cache/rsnapshot -o mlimit=100M >/dev/null 2>&1 
/usr/bin/rsnapshot daily
umount /var/cache/rsnapshot /rsnapshot2 /rsnapshot1 
exit 0

As far as I know, both mount and umount block until they reach
some sort of resolution, be it success or failure.  I have even
seen umount hang for a perceptible amount of time when one has
changed a large number of blocks and sync hasn't had time to
catch up so I am curious as to what may be happening.

	I was able to simply re-run the very same command later
for the daily backup and it worked without so much as a peep out
of anything. It just ran.

	The two backup drives in question passed fsck -f -y without
a single issue.  In most cases of one of these big error spews,
the files are in places that aren't changing on a regular basis
so it's not as if I caught a log file just as it was backing up.
I don't ever remember seeing log files in the spew.

	I always leave the backup drives unmounted unless I need
something off of them or the cron job is running since it would
seem to make sense to not mount them continuously.  A couple of
weeks ago, we got 4 electrical power blinks in one day so you
don't want your backup media mounted any more than necessary.

	When I do pull something off of the backups, it is good
and not corrupted so far so I am really curious as to what is

Martin WB5AGZ