Web lists-archives.com

Re: Confusion about the PACK format





On 10/02/2019 19:05, Ramsay Jones wrote:
> 
> 
> On 10/02/2019 16:02, Florian Steenbuck wrote:
>> Hello to all,
>>
>> I try to understand the git protocol only on the server site. So I
>> start without reading any docs and which turns to be fine until I got
>> to the PACK format (pretty early failure I know).
>>
>> I have read this documentation:
>> https://raw.githubusercontent.com/git/git/c4df23f7927d8d00e666a3c8d1b3375f1dc8a3c1/Documentation/technical/pack-format.txt
>>
>> But their are some confusion about this text.
>>
>> The basic header is no problem, but somehow I got stuck while try to
>> read the length and type of the objects, which are ints that can be
>> resolved with 3-bits and 4-bits. The question is where and how ?
>>
> 
> Hmm, the 'type and length' encoding could be described more clearly!
> Hopefully, just on this issue, the following could help:
> 
> In my git.git repo, which is fully packed, I have a single pack file, with
> 
>   $ git count-objects -v
>   count: 0
>   size: 0
>   in-pack: 270277
>   packs: 1
>   size-pack: 101929
>   prune-packable: 0
>   garbage: 0
>   size-garbage: 0
>   $ 
> 
> ... 270277 objects in it. The beginning of the file looks like:
> 
>   $ xxd .git/objects/pack/pack-d554e6d8335601c2525b40487faf36493094ab50.pack | head
>   00000000: 5041 434b 0000 0002 0004 1fc5 9d13 789c  PACK..........x.
>   00000010: 9d8f cd6a c330 1084 ef7a 8a3d 171a b4ab  ...j.0...z.=....
>   00000020: 9525 8750 0abd 945c f304 ab95 5cfb 602b  .%.P...\....\.`+
>   00000030: b84a 7fde 3e2a 943e 406f c3f0 cd30 d3f6  .J..>*.>@o...0..
>   00000040: 5260 741a 5025 92e2 1458 917c c294 a3c3  R`t.P%...X.|....
>   00000050: 4803 e521 395f c2d8 4d73 95bd 6c0d 82f5  H..!9_..Ms..l...
>   00000060: 6172 310f 0529 7a2f d6a7 40c5 d9a0 d185  ar1..)z/..@.....
>   00000070: 622d 8789 9cb8 3f1e 5132 6366 4de4 8531  b-....?.Q2cfM..1
>   00000080: 114a 70ec 9447 2f5a 526f e29c 3847 23b7  .Jp..G/ZRo..8G#.
>   00000090: 36d7 1dce b76d a9f0 02af b2ca 56e1 f4b6  6....m......V...
>   $ 
> 
> You can see the header, which consists of 3 32-bit values, where the
> packfile signature is the '5041 434b', then the version number which
> is '0000 0002', followed by the number of objects '0004 1fc5' which
> is 270277. Next comes the first 'object entry', which starts '9d13'.
> 
> Now, the 'n-byte type and length' is a variable length encoding of
> the object type and length. The number of bytes used to encode this
> data is content dependant. If the top bit of a byte is set, then we
> need to process the next byte, otherwise we are done. So, looking
> at the first 'object entry' byte (at offset 12) '9d', we take the
> top nibble, remove the top bit, and shift right 4 bits to get the
> object type. ie. (0x9d >> 4) & 7 which gives an object type of 1
> (which is a commit object). The lower nibble of the first byte
> contains the first (or only) 4 bits of the size, here (0x9d & 15)
> which is 0xd. Given that the top bit of this byte is set, we now
> process the next byte. After the first byte, each byte contains 7
> bits of the size field which is combined with the value from the
> previous byte by shifting and adding (first by 4 bits, then 11, 18,
> 25 etc.). So, in this case we have (0x13 << 4) + 0xd = 317.

Sorry, to be clear, I should have said, "mask off the top bit,
shift and add", so:

  ((0x13 & 0x7f) << 4) + 0xd = 317

ATB,
Ramsay Jones

> 
> The compressed data follows, '789c' ...
> 
> We can use git-verify-pack to confirm the details here:
> 
>   $ git verify-pack -v .git/objects/pack/pack-d554e6d8335601c2525b40487faf36493094ab50.idx | head -n 1
>   878e2cd30e1656909c5073043d32fe9d02204daa commit 317 216 12
>   $ 
>  
> So the object 878e2cd30e, at offset 12 in the file, is a commit object
> with size 317 (which has an in-pack size of 216).
> 
> Hope this helps.
> 
> ATB,
> Ramsay Jones
> 
>