Web lists-archives.com

Re: Optimizing writes to unchanged files during merges?




On Thu, Apr 12, 2018 at 5:01 PM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> [ Still talking to myself. Very soothing. ]
>
> On Thu, Apr 12, 2018 at 4:55 PM, Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>> [ Talking to myself ]
>>
>> Did it perhaps mean to say
>>
>>                 path_renamed_outside_HEAD = path2 && !strcmp(path, path2);
>>
>> instead?
>
> Probably not correct, but making that change makes my test-case work.
>
> It probably breaks something important, though. I didn't look at why
> that code happened in the first place.
>
> But it's nice to know I'm at least looking at the right code while I'm
> talking to myself.

I hope you don't mind me barging into your conversation, but I figured
out some interesting details.

What we want here is that if there are no content conflicts and the
contents match the version of the file from HEAD, and there are no
mode conflicts and the final mode matches what was in HEAD, and there
are no path conflicts (e.g. a rename/rename(1to2) issue or a D/F
conflict putting a directory in the way) and the final path matches
what was already in HEAD, then we can skip the update.

What does this look like in code?  Well, most of it was already there:

if (mfi.clean && !df_conflict_remains &&
    oid_eq(&mfi.oid, a_oid) && mfi.mode == a_mode) {

That covers everything except "the final path matches what was already
in HEAD".  How do we check for that?  The current code is just making
an approximation; your modification improves the approximation.
However, it turns out we have this awesome function called
"was_tracked(const char *path)" that was intended for answering this
exact question.  So, assuming was_tracked() isn't buggy, the correct
patch for this problem would look like:

-               path_renamed_outside_HEAD = !path2 || !strcmp(path, path2);
-               if (!path_renamed_outside_HEAD) {
+               if (was_tracked(path)) {

Turns out that patch was already submitted as c5b761fb
("merge-recursive: ensure we write updates for directory-renamed
file", 2018-02-14).  While that patch was for a different bug, it is
identical to what I would have proposed to fix this bug.  A big series
including that patch was merged to master two days ago, but
unfortunately that exact patch was the one that caused some
impressively awful fireworks[1].  So it, along with the rest of the
series it was part of, was reverted just a few hours later.  As it
turns out, the cause of the problem is that was_tracked() can lie when
renames are involved.  It just hadn't ever bit us yet.

I have a fix for was_tracked(), and 10 additional testcases covering
interesting should-be-skipped and
should-not-be-skipped-but-is-close-to-a-should-be-skipped scenarios,
verifying the skipping actually happens if and only if it should
happen.  That should fix your bug, and the bug with my series.  Rough
WIP can be seen at the testme branch in github.com/newren/git for the
curious, but I need to sleep and have a bunch of other things on my
plate for the next few days.  But I thought I'd at least mention what
I've found so far.

Elijah

[1] https://public-inbox.org/git/xmqqefjm3w1h.fsf@xxxxxxxxxxxxxxxxxxxxxxxxx/