Web lists-archives.com

Re: Confusion about the PACK format





On 10/02/2019 16:02, Florian Steenbuck wrote:
> Hello to all,
> 
> I try to understand the git protocol only on the server site. So I
> start without reading any docs and which turns to be fine until I got
> to the PACK format (pretty early failure I know).
> 
> I have read this documentation:
> https://raw.githubusercontent.com/git/git/c4df23f7927d8d00e666a3c8d1b3375f1dc8a3c1/Documentation/technical/pack-format.txt
> 
> But their are some confusion about this text.
> 
> The basic header is no problem, but somehow I got stuck while try to
> read the length and type of the objects, which are ints that can be
> resolved with 3-bits and 4-bits. The question is where and how ?
> 

Hmm, the 'type and length' encoding could be described more clearly!
Hopefully, just on this issue, the following could help:

In my git.git repo, which is fully packed, I have a single pack file, with

  $ git count-objects -v
  count: 0
  size: 0
  in-pack: 270277
  packs: 1
  size-pack: 101929
  prune-packable: 0
  garbage: 0
  size-garbage: 0
  $ 

... 270277 objects in it. The beginning of the file looks like:

  $ xxd .git/objects/pack/pack-d554e6d8335601c2525b40487faf36493094ab50.pack | head
  00000000: 5041 434b 0000 0002 0004 1fc5 9d13 789c  PACK..........x.
  00000010: 9d8f cd6a c330 1084 ef7a 8a3d 171a b4ab  ...j.0...z.=....
  00000020: 9525 8750 0abd 945c f304 ab95 5cfb 602b  .%.P...\....\.`+
  00000030: b84a 7fde 3e2a 943e 406f c3f0 cd30 d3f6  .J..>*.>@o...0..
  00000040: 5260 741a 5025 92e2 1458 917c c294 a3c3  R`t.P%...X.|....
  00000050: 4803 e521 395f c2d8 4d73 95bd 6c0d 82f5  H..!9_..Ms..l...
  00000060: 6172 310f 0529 7a2f d6a7 40c5 d9a0 d185  ar1..)z/..@.....
  00000070: 622d 8789 9cb8 3f1e 5132 6366 4de4 8531  b-....?.Q2cfM..1
  00000080: 114a 70ec 9447 2f5a 526f e29c 3847 23b7  .Jp..G/ZRo..8G#.
  00000090: 36d7 1dce b76d a9f0 02af b2ca 56e1 f4b6  6....m......V...
  $ 

You can see the header, which consists of 3 32-bit values, where the
packfile signature is the '5041 434b', then the version number which
is '0000 0002', followed by the number of objects '0004 1fc5' which
is 270277. Next comes the first 'object entry', which starts '9d13'.

Now, the 'n-byte type and length' is a variable length encoding of
the object type and length. The number of bytes used to encode this
data is content dependant. If the top bit of a byte is set, then we
need to process the next byte, otherwise we are done. So, looking
at the first 'object entry' byte (at offset 12) '9d', we take the
top nibble, remove the top bit, and shift right 4 bits to get the
object type. ie. (0x9d >> 4) & 7 which gives an object type of 1
(which is a commit object). The lower nibble of the first byte
contains the first (or only) 4 bits of the size, here (0x9d & 15)
which is 0xd. Given that the top bit of this byte is set, we now
process the next byte. After the first byte, each byte contains 7
bits of the size field which is combined with the value from the
previous byte by shifting and adding (first by 4 bits, then 11, 18,
25 etc.). So, in this case we have (0x13 << 4) + 0xd = 317.

The compressed data follows, '789c' ...

We can use git-verify-pack to confirm the details here:

  $ git verify-pack -v .git/objects/pack/pack-d554e6d8335601c2525b40487faf36493094ab50.idx | head -n 1
  878e2cd30e1656909c5073043d32fe9d02204daa commit 317 216 12
  $ 
 
So the object 878e2cd30e, at offset 12 in the file, is a commit object
with size 317 (which has an in-pack size of 216).

Hope this helps.

ATB,
Ramsay Jones