Web lists-archives.com

Re: pack file object size question

On Sun, Dec 16, 2018 at 04:14:46PM -0800, Jonathan Nieder wrote:
> Hi,
> Farhan Khan wrote:
> >> Farhan Khan wrote:
> >>> I am having trouble figuring out the boundary between two objects in
> >>> the pack file.
> [...]
> >              I think the issue is, the compressed object has a fixed
> > size and git inflates it, then moves on to the next object. I am
> > trying to figure out how where it identifies the size of the object.
> Do you mean the compressed size or uncompressed size?
> It sounds to me like pack-format.txt needs to do a better job of
> distinguishing the two.

How about something like this?

I mostly wrote this based on memory (and a very quick look at
index-pack) but I think we never ever really stored compressed
sizes. The "length" field (even in loose format) is always about
uncompressed size.

-- 8< --
diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
index cab5bdd2ff..4fd49f61d6 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -31,6 +31,11 @@ Git pack format
 	 is an OBJ_OFS_DELTA object
      compressed delta data
+     Note: The length (in bytes) is of uncompressed objects or
+     deltified representation. We're supposed to reach the end of zlib
+     stream once we have inflated the given length, otherwise it's a
+     corrupted pack file.
      Observation: length of each object is encoded in a variable
      length format and is not constrained to 32-bit or anything.
@@ -199,7 +204,8 @@ Pack file entry: <+
 		is the size before compression).
 	If it is REF_DELTA, then
 	  20-byte base object name SHA-1 (the size above is the
-		size of the delta data that follows).
+		size of the delta data that follows, before
+		compression).
           delta data, deflated.
 	If it is OFS_DELTA, then
 	  n-byte offset (see below) interpreted as a negative
-- 8< --