Web lists-archives.com

Optimizing writes to unchanged files during merges?




So I just had an interesting experience that has happened before too,
but this time I decided to try to figure out *why* it happened.

I'm obviously in the latter part of the kernel merge window, and
things are slowly calming down. I do the second XFS merge during this
window, and it brings in updates just to the fs/xfs/ subdirectory, so
I expect that my test build for the full kernel configuration should
be about a minute.

Instead of recompiles pretty much *everything*, and the test build
takes 22 minutes.

This happens occasionally, and I blame gremlins. But this time I
decided to look at what the gremlins actually *are*.

The diff that git shows for the pull was very clear: only fs/xfs/
changes. But doing

  ls -tr $(git ls-files '*.[chS]') | tail -10

gave the real reason: in between all the fs/xfs/xyz files was this:

    include/linux/mm.h

and yeah, that rather core header file causes pretty much everything
to be re-built.

Now, the reason it was marked as changed is that the xfs branch _had_
in fact changed it, but the changes were already upstream and got
merged away. But the file still got written out (with the same
contents it had before the merge), and 'make' obviously only looks at
modification time, so make rebuilt everything.

Now, because it's still the merge window, I don't have much time to
look into this, but I was hoping somebody in git land would like to
give it a quick look. I'm sure I'm not the only one to have ever been
hit by this, and I don't think the kernel is the only project to hit
it either.

Because it would be lovely if the merging logic would just notice "oh,
that file doesn't change", and not even write out the end result.

For testing, the merge that triggered this git introspection is kernel
commit 80aa76bcd364 ("Merge tag 'xfs-4.17-merge-4' of
git://git.kernel.org/pub/scm/fs/xfs/xfs-linux"), which can act as a
test-case. It's a clean merge, so no kernel knowledge necessary: just
re-merge the parents and see if the modification time of
include/linux/mm.h changes.

I'm guessing some hack in update_file_flags() to say "if the new
object matches the old one, don't do anything" might work. Although I
didn't look if we perhaps end up writing to the working tree copy
earlier already.

Looking at the blame output of that function, most of it is really
old, so this "problem" goes back a long long time.

Just to clarify: this is absolutely not a correctness issue. It's not
even a *git* performance issue. It's literally just a "not updating
the working tree when things don't change results in better behavior
for other tools".

So if somebody gets interested in this problem, that would be great.
And if not, I'll hopefully remember to come back to this next week
when the merge window is over, and take a second look.

                     Linus