Web lists-archives.com

Re: Optimizing writes to unchanged files during merges?




On Fri, Apr 13, 2018 at 12:02 AM, Elijah Newren <newren@xxxxxxxxx> wrote:
>
> I hope you don't mind me barging into your conversation

I was getting tired of my own rambling anyway..

> However, it turns out we have this awesome function called
> "was_tracked(const char *path)" that was intended for answering this
> exact question.  So, assuming was_tracked() isn't buggy, the correct
> patch for this problem would look like:

Apparently that causes problems, for some odd reason.

I like the notion of checking the index, but it's not clear that the
index is reliable in the presence of renames either.

>   A big series
> including that patch was merged to master two days ago, but
> unfortunately that exact patch was the one that caused some
> impressively awful fireworks[1].

Yeah, so this code is fragile.

How about we take a completely different approach? Instead of relying
on fragile (but clever) tests, why not rely on stupid brute force?

Yeah, yeah, it's bad to be stupid, but sometimes simple and stupid
really does work.

See the attached patch. It gets rid of all the subtle "has this been
renamed" tests entirely, and just _always_ does that final
update_file().

But instead, it makes update_file() a bit smarter, and says "before we
write this file out, let's see if it's already there and already has
the expected contents"?

Now, it really shouldn't be _quite_ as stupid as that: we should
probably set a flag in the "hey, the oid matches, maybe it's worth
checking", so that it doesn't do the check in the cases where we know
the merge has done things.

But it's actually *fairly* low cost, because before it reads the file
it at least checks that file length matches the expected length (and
that the user permission bits match the expected mode).

So if the file doesn't match, most of the time the real cost will just
be an extra 'open/fstat/close()' sequence. That's pretty cheap.

So even the completely stupid approach is probably not too bad, and it
could be made smarter.

NOTE! I have *NOT* tested this on anything interesting. I tested the
patch on my stupid test-case, but that'[s it. I didn't even bother
re-doing the kernel merge that started this.

Comments? Because considering the problems this code has had, maybe
"stupid" really is the right approach...

[ Ok, I lied. I just tested this on the kernel merge. It worked fine,
and avoided modifying <linux/mm.h> ]

                     Linus
 merge-recursive.c | 42 +++++++++++++++++++++++++++++-------------
 1 file changed, 29 insertions(+), 13 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 0c0d48624..ed2200065 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -815,6 +815,32 @@ static int make_room_for_path(struct merge_options *o, const char *path)
 	return err(o, msg, path, _(": perhaps a D/F conflict?"));
 }
 
+static int working_tree_matches(const char *path, const char *buf, unsigned long size, unsigned mode)
+{
+	int fd, matches;
+	struct stat st;
+
+	fd = open(path, O_RDONLY);
+	if (fd < 0)
+		return 0;
+	matches = 0;
+	if (!fstat(fd, &st) && st.st_size == size && S_ISREG(st.st_mode) && !(0700 & (st.st_mode ^ mode))) {
+		char tmpbuf[1024];
+		while (size) {
+			int n = read(fd, tmpbuf, sizeof(tmpbuf));
+			if (n <= 0 || n > size)
+				break;
+			if (memcmp(tmpbuf, buf, n))
+				break;
+			buf += n;
+			size -= n;
+		}
+		matches = !size;
+	}
+	close(fd);
+	return matches;
+}
+
 static int update_file_flags(struct merge_options *o,
 			     const struct object_id *oid,
 			     unsigned mode,
@@ -856,6 +882,8 @@ static int update_file_flags(struct merge_options *o,
 				size = strbuf.len;
 				buf = strbuf_detach(&strbuf, NULL);
 			}
+			if (working_tree_matches(path, buf, size, mode))
+				goto free_buf;
 		}
 
 		if (make_room_for_path(o, path) < 0) {
@@ -1782,20 +1810,8 @@ static int merge_content(struct merge_options *o,
 
 	if (mfi.clean && !df_conflict_remains &&
 	    oid_eq(&mfi.oid, a_oid) && mfi.mode == a_mode) {
-		int path_renamed_outside_HEAD;
 		output(o, 3, _("Skipped %s (merged same as existing)"), path);
-		/*
-		 * The content merge resulted in the same file contents we
-		 * already had.  We can return early if those file contents
-		 * are recorded at the correct path (which may not be true
-		 * if the merge involves a rename).
-		 */
-		path_renamed_outside_HEAD = !path2 || !strcmp(path, path2);
-		if (!path_renamed_outside_HEAD) {
-			add_cacheinfo(o, mfi.mode, &mfi.oid, path,
-				      0, (!o->call_depth), 0);
-			return mfi.clean;
-		}
+		/* We could set a flag here and pass it to "update_file()" */
 	} else
 		output(o, 2, _("Auto-merging %s"), path);