Web lists-archives.com

[RFC PATCH] checkout: Force matching mtime between files

Currently git does not control mtimes of files being checked out.  This
means that the only assumption you could make is that all files created
or modified within a single checkout action will have mtime between
start time and end time of this checkout.  The relations between mtimes
of different files depend on the order in which they are checked out,
filesystem speed and timestamp precision.

Git repositories sometimes contain both generated and respective source
files.  For example, the Gentoo 'user syncing' repository combines
source ebuild files with generated metadata cache for user convenience.
Ideally, the 'git checkout' would be fast enough that (combined with low
timestamp precision) all files created or modified within a single
checkout would have matching timestamp.  However, in reality the cache
files may end up being wrongly 'older' than source file, requiring
unnecessary recheck.

The opposite problem (cache files not being regenerated when they should
have been) might also occur.  However, it could not be solved without
preserving timestamps, therefore it is out of scope of this change.

In order to avoid unnecessary cache mismatches, force a matching mtime
between all files created by a single checkout action.  This seems to be
the best course of action.  Matching mtimes do not trigger cache
updates.  They also match the concept of 'checkout' being an atomic
action.  Finally, this change does not break backwards compatibility
as the new result is a subset of the possible previous results.

For example, let's say that 'git checkout' creates two files in order,
with respective timestamps T1 and T2.  Before this patch, T1 <= T2.
After this patch, T1 == T2 which also satisfies T1 <= T2.

A similar problem was previously being addressed via git-restore-mtime
tool.  However, that solution is unnecessarily complex for Gentoo's use
case and does not seem to be suitable for 'seamless' integration.

The patch integrates mtime forcing via adding an additional member of
'struct checkout'.  This seemed the simplest way of adding it without
having to modify prototypes and adjust multiple call sites.  The member
is set to the current time in check_updates() function and is afterwards
enforced by checkout_entry().

The patch uses utime() rather than futimes() as that seems to be
the function used everywhere else in git and provided some MinGW
compatibility code.

Signed-off-by: Michał Górny <mgorny@xxxxxxxxxx>
 cache.h        |  1 +
 entry.c        | 12 +++++++++++-
 unpack-trees.c |  1 +
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/cache.h b/cache.h
index bbaf5c349..9f0a7c867 100644
--- a/cache.h
+++ b/cache.h
@@ -1526,6 +1526,7 @@ struct checkout {
 	const char *base_dir;
 	int base_dir_len;
 	struct delayed_checkout *delayed_checkout;
+	time_t checkout_mtime;
 	unsigned force:1,
diff --git a/entry.c b/entry.c
index 2101201a1..7ee5a6d28 100644
--- a/entry.c
+++ b/entry.c
@@ -411,6 +411,7 @@ int checkout_entry(struct cache_entry *ce,
 	static struct strbuf path = STRBUF_INIT;
 	struct stat st;
+	int ret;
 	if (topath)
 		return write_entry(ce, topath, state, 1);
@@ -473,5 +474,14 @@ int checkout_entry(struct cache_entry *ce,
 		return 0;
 	create_directories(path.buf, path.len, state);
-	return write_entry(ce, path.buf, state, 0);
+	ret = write_entry(ce, path.buf, state, 0);
+	if (ret == 0 && state->checkout_mtime != 0) {
+		struct utimbuf t;
+		t.modtime = state->checkout_mtime;
+		if (utime(path.buf, &t) < 0)
+			warning_errno("failed utime() on %s", path.buf);
+	}
+	return ret;
diff --git a/unpack-trees.c b/unpack-trees.c
index e73745051..e1efefb68 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -346,6 +346,7 @@ static int check_updates(struct unpack_trees_options *o)
 	state.quiet = 1;
 	state.refresh_cache = 1;
 	state.istate = index;
+	state.checkout_mtime = time(NULL);
 	progress = get_progress(o);