Re: [PATCH 12/17] Documentation: describe split commit-graphs
- Date: Wed, 8 May 2019 15:00:11 -0400
- From: Derrick Stolee <stolee@xxxxxxxxx>
- Subject: Re: [PATCH 12/17] Documentation: describe split commit-graphs
On 5/8/2019 1:20 PM, SZEDER Gábor wrote:
> On Wed, May 08, 2019 at 08:53:57AM -0700, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
> Consider the following sequence of events:
> 1. There are three commit-graph files in the repository.
> 2. A git process opens the base commit-graph and commit-graph-1 for
> reading. It doesn't yet open commit-graph-2, because the (for
> arguments sake not very fair) scheduler takes the CPU away.
> 3. Meanwhile, a 'git fetch', well, fetches from a remote, and
> upon noticing that it got a lot of commits it decides to collapse
> commit-graph-1 and -2 and the new commits, writing a brand new
> 4. A second fetch fetches from a second remote, and writes
> commit-graph-2 (no collapsing this time).
> 5. Now the crappy scheduler finally decides that it's time to wake
> up the waiting git process from step 2, which then finds the new
> commit-graph-2 file and opens it for reading.
> 6. At this point this poor git process has file handles for:
> - the base commit-graph file, which is unchanged.
> - the old commit-graph-1 which has since been replaced, and does
> not yet contain info about the old commit-graph-2 or the
> commits received in the first fetch.
> - the new commit-graph-2, containing info only about commits
> received in the second fetch, and whose parents' graph
> positions point either to the base commitg-graph (good, since
> unchanged) or to the new commit-graph-1 (uh-oh).
> What happens next? If this process tries to access the parent of a
> commit from commit-graph-2, and the metadata about this parent is in
> the new commit-graph-1, then I expect all kinds of weird bugs.
> But will a git process ever try to access a commit that didn't yet
> existed in the repository when it started opening the commit-graph
I'll ignore the improbability of this turn of events (two writes happening
during the span of trying to read two files) and focus on the fact that
we can prevent issues here using the 4th TODO item in my cover letter:
4. It would be helpful to add a new optional chunk that contains the
trailing hash for the lower level of the commit-graph stack. This chunk
would only be for the commit-graph-N files, and would provide a simple
way to check that the stack is valid on read, in case we are still
worried about other processes reading/writing in the wrong order.
If we have this chunk -- you have convinced me that we need it -- then we
could ignore the "new" commit-graph-2 because its base graph hash does not
match. We can continue without dying because we can always parse the "missing"
commits from the packs.