Web lists-archives.com

Re: [PATCH 02/23] midx: add midx format details to pack-format.txt




Hi Derrick,
On Thu, Jun 7, 2018 at 7:03 AM Derrick Stolee <stolee@xxxxxxxxx> wrote:
>
> The multi-pack-index (MIDX) feature generalizes the existing pack-
> index (IDX) feature by indexing objects across multiple pack-files.
>
> Describe the basic file format, using a 12-byte header followed by
> a lookup table for a list of "chunks" which will be described later.
> The file ends with a footer containing a checksum using the hash
> algorithm.
>
> The header allows later versions to create breaking changes by
> advancing the version number. We can also change the hash algorithm
> using a different version value.
>
> We will add the individual chunk format information as we introduce
> the code that writes that information.
>
> Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
> ---
>  Documentation/technical/pack-format.txt | 49 +++++++++++++++++++++++++
>  1 file changed, 49 insertions(+)
>
> diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
> index 70a99fd142..17666b4bfc 100644
> --- a/Documentation/technical/pack-format.txt
> +++ b/Documentation/technical/pack-format.txt
> @@ -252,3 +252,52 @@ Pack file entry: <+
>      corresponding packfile.
>
>      20-byte SHA-1-checksum of all of the above.
> +
> +== midx-*.midx files have the following format:
> +
> +The meta-index files refer to multiple pack-files and loose objects.

So is it meta or multi?

> +In order to allow extensions that add extra data to the MIDX, we organize
> +the body into "chunks" and provide a lookup table at the beginning of the
> +body. The header includes certain length values, such as the number of packs,
> +the number of base MIDX files, hash lengths and types.
> +
> +All 4-byte numbers are in network order.
> +
> +HEADER:
> +
> +       4-byte signature:
> +           The signature is: {'M', 'I', 'D', 'X'}
> +
> +       1-byte version number:
> +           Git only writes or recognizes version 1
> +
> +       1-byte Object Id Version
> +           Git only writes or recognizes verion 1 (SHA-1)

s/verion/version/

> +       1-byte number (C) of "chunks"
> +
> +       1-byte number (I) of base multi-pack-index files:
> +           This value is currently always zero.

Oh? Are meta-index and multi-index files different things?

> +       4-byte number (P) of pack files
> +
> +CHUNK LOOKUP:
> +
> +       (C + 1) * 12 bytes providing the chunk offsets:
> +           First 4 bytes describe chunk id. Value 0 is a terminating label.
> +           Other 8 bytes provide offset in current file for chunk to start.
> +           (Chunks are provided in file-order, so you can infer the length
> +           using the next chunk position if necessary.)

It is so nice to have the header also have 12 bytes, so it fits right into the
lookup table. So an alternative point of view:

  If a chunk needs to store more than 8 bytes, we'll have an offset after
  the first 4 bytes that describe the chunk, otherwise you can store the 8 bytes
  of information directly after the 4 bytes.
   "MIDX" is a special chunk and must come first (does it?) and only once
  as it contains the version number.

> +       The remaining data in the body is described one chunk at a time, and
> +       these chunks may be given in any order. Chunks are required unless
> +       otherwise specified.
> +
> +CHUNK DATA:
> +
> +       (This section intentionally left incomplete.)
> +
> +TRAILER:
> +
> +       H-byte HASH-checksum of all of the above.

This means we have to rehash the whole file for updating its contents.
okay.