Web lists-archives.com

Re: [PATCH v1] diffcore-rename: speed up register_rename_src




On Tue, Apr 18, 2017 at 07:45:05PM -0700, Junio C Hamano wrote:

> Jeff King <peff@xxxxxxxx> writes:
> 
> > On Tue, Apr 18, 2017 at 07:44:21PM +0000, git@xxxxxxxxxxxxxxxxx wrote:
> >
> >> From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>
> >> 
> >> Teach register_rename_src() to see if new file pair
> >> can simply be appended to the rename_src[] array before
> >> performing the binary search to find the proper insertion
> >> point.
> >
> > I guess your perf results show some minor improvement. But I suspect
> > this is because your synthetic repo does not resemble the real world
> > very much. You're saving a few strcmps, but for each of those files
> > you're potentially going to have actually zlib inflate the object
> > contents and do similarity analysis.
> >
> > So "absurd number of files doing 100% exact renames" is the absolute
> > best case, and it saves a few percent.
> >
> > I dunno. It is not that much code _here_, but I'm not excited about the
> > prospect of sprinkling this same "check the last one" optimization all
> > over the code base. I wonder if there's some way to generalize it.
> 
> When adding many things, we often just append and then sort at the
> end after we finished adding.  I wonder if recent "check the last
> one and append" optimization beats that strategy.

The big question is whether we need to detect duplicates while we're
appending to the list, which is hard on an unsorted list.  In this
function, at least, we do detect when the path already exists and return
the existing entry. I'm not sure under what circumstances we would see
such a duplicate, though, as each filename should appear only once in
the tree diff. I would think.

Doing:

diff --git a/diffcore-rename.c b/diffcore-rename.c
index f7444c86b..56a493d97 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -86,7 +86,7 @@ static struct diff_rename_src *register_rename_src(struct diff_filepair *p)
 		struct diff_rename_src *src = &(rename_src[next]);
 		int cmp = strcmp(one->path, src->p->one->path);
 		if (!cmp)
-			return src;
+			die("BUG: duplicate rename src: %s", one->path);
 		if (cmp < 0) {
 			last = next;
 			continue;

passes the test suite, at least. :)

-Peff