Web lists-archives.com

Re: File versioning based on shallow Git repositories?




On Thu, Apr 12 2018, Hallvard Breien Furuseth wrote:

> Can I use a shallow Git repo for file versioning, and regularly purge
> history older than e.g. 2 weeks?  Purged data MUST NOT be recoverable.
>
> Or is there a backup tool based on shallow Git cloning which does this?
> Push/pull to another shallow repo would be nice but is not required.
> The files are text files up to 1/4 Gb, usually with few changes.
>
>
> If using Git - I see "git fetch --depth" can shorten history now.
> How do I do that without 'fetch', in the origin repo?
> Also Documentation/technical/shallow.txt describes some caveats, I'm
> not sure how relevant they are.
>
> To purge old data -
>   git config core.logallrefupdates false
>   git gc --prune=now --aggressive
> Anything else?
>
> I'm guessing that without --aggressive, some expired info might be
> deduced from studying the packing of the remaining objects.  Don't
> know if we'll be required to be that paranoid.

The shallow feature is not for this use-case, but there's a much easier
solution that I've used for exactly this use-case, e.g. taking backups
of SQL dumps that delta-compress well, and then throwing out old
backups.

You:

1. Create a backup.git repo
2. Each time you make a backup, checkout a new orphan branch, see "git
   checkout --orphan"
3. You copy the files over, commit them, "git log" at this point shows
   one commit no matter if you've done this before.
4. You create a tag for this backup, e.g. one named after the current
   time, delete the branch.
5. You then have a retention period for the tags, e.g. only keep the
   last 30 tags if you do daily backups for 30 days of backups.

Then as soon as you delete the tags the old commit will be unreferenced,
and you can make git-gc delete the data.

You'll still be able to `git diff` between tags, even though they have
unrelated histories, and the files will still delta-compress.