Re: [RFC PATCH 01/18] docs: Multi-Pack Index (MIDX) Design Notes
- Date: Mon, 8 Jan 2018 11:32:26 -0800
- From: Jonathan Tan <jonathantanmy@xxxxxxxxxx>
- Subject: Re: [RFC PATCH 01/18] docs: Multi-Pack Index (MIDX) Design Notes
On Sun, 7 Jan 2018 13:14:42 -0500
Derrick Stolee <stolee@xxxxxxxxx> wrote:
> +Design Details
> +- The MIDX file refers only to packfiles in the same directory
> + as the MIDX file.
> +- A special file, 'midx-head', stores the hash of the latest
> + MIDX file so we can load the file without performing a dirstat.
> + This file is especially important with incremental MIDX files,
> + pointing to the newest file.
I presume that the actual MIDX files are named by hash? (You might have
written this somewhere that I somehow missed.)
Also, I notice that in the "Future Work" section, the possibility of
multiple MIDX files is raised. Could this 'midx-head' file be allowed to
store multiple such files? That way, we avoid a bit of file format
churn (in that we won't need to define a new "chunk" in the future).
> +- If a packfile exists in the pack directory but is not referenced
> + by the MIDX file, then the packfile is loaded into the packed_git
> + list and Git can access the objects as usual. This behavior is
> + necessary since other tools could add packfiles to the pack
> + directory without notifying Git.
> +- The MIDX file should be only a supplemental structure. If a
> + user downgrades or disables the `core.midx` config setting,
> + then the existing .idx and .pack files should be sufficient
> + to operate correctly.
Let me try to summarize: so, at this point, there are no
backwards-incompatible changes to the repo disk format. Unupdated code
paths (and old versions of Git) can just read the .idx and .pack files,
as always. Updated code paths will look at the .midx and .idx files, and
will sort them as follows:
- .midx files go into a data structure
- .idx files not referenced by any .midx files go into the
existing packed_git data structure
A writer can either merely write a new packfile (like old versions of
Git) or write a packfile and update the .midx file, and everything above
will still work. In the event that a writer deletes an existing packfile
referenced by a .midx (for example, old versions of Git during a
repack), we will lose the advantages of the .midx file - we will detect
that the .midx no longer works when attempting to read an object given
its information, but in this case, we can recover by dropping the .midx
file and loading all the .idx files it references that still exist.
As a reviewer, I think this is a very good approach, and this does make
things easier to review (as opposed to, say, an approach where a lot of
the code must be aware of .midx files).