Web lists-archives.com

Re: Reduce pack-objects memory footprint?

On Wed, Feb 28, 2018 at 04:27:22PM +0700, Duy Nguyen wrote:

> linux-2.6.git current has 6483999 objects. "git gc" on my poor laptop
> consumes 1.7G out of 4G RAM, pushing lots of data to swap and making
> all apps nearly unusuable (granted the problem is partly Linux I/O
> scheduler too). So I wonder if we can reduce pack-objects memory
> footprint a bit.

Yeah, the per object memory footprint is not great. Around 100 million
objects it becomes pretty ridiculous. I started to dig into it a year or
three ago when I saw such a case, but it turned out to be something that
we could prune. The torvalds/linux fork network has ~23 million objects,
so it's probably 7-8 GB of book-keeping. Which is gross, but 64GB in a
server isn't uncommon these days.

I think laptops repacking the kernel are probably one of the worst cases
(leaving aside the awful Windows repository, but my impression is that
they simply can't do a full repack at all there).

> This demonstration patch (probably breaks some tests) would reduce the
> size of struct object_entry from from 136 down to 112 bytes on
> x86-64. There are 6483999 of these objects, so the saving is 17% or
> 148 MB.

136 x 6.5M objects is only about 800MB. I suspect a big chunk of the
rest is going to the object structs we create when doing the internal
rev-list traversal. And those duplicate the 20-byte sha1s at the very

I don't know if it would be a good idea to free them after the
traversal, though. We do use them again later in the bitmap case. On the
other hand, we could probably do so for the non-bitmap case. And even
for the bitmap case, the primary value in keeping them around is that
the parent pointers will already be cached. So it might make sense to
free the blobs and trees (though it might be tricky; the commits have
pointers to the trees).

It also doesn't help with peak memory usage, because you'll have the
full to_pack list and all of the "struct object" in memory together at
one point.

Another option would be to somehow replace the pack_idx_entry with a
reference to a "struct object". That would let us at least avoid storing
the 20-byte oid twice.

> If we go further, notice that nr_objects is uint32_t, we could convert
> the three pointers
> 	struct object_entry *delta;
> 	struct object_entry *delta_child;
> 	struct object_entry *delta_sibling;
> to
> 	uint32_t delta;
> 	uint32_t delta_child;
> 	uint32_t delta_sibling;
> which saves 12 bytes (or another 74 MB). 222 MB total is plenty of
> space to keep some file cache from being evicted.

Yeah, that seems like low-hanging fruit. I'd also note that we don't
actually use all of the fields during the whole process. I think some of
those delta fields are only used for a short time. So we might be able
to reduce peak memory if there are some mutually exclusive bits of each
entry (and even if there's some overlap, we'd reduce the length of time
we'd need to be at peak).

> Is it worth doing this? The struct packing makes it harder to read
> (and more fragile too). I added some more artifical limit like max
> depth of 2^11. But I think 4096+ depth is getting unreasonable.

I'm OK with a limit like 4096, as long as we notice when we hit the
limit and behave reasonably. I think the algorithm in
break_delta_chains() may temporarily store up to uint32_t depth. But I
think we won't ever write anything into cur->depth larger than the
max-depth limit. So it would probably be sufficient to just check that
the --depth argument is reasonably sized and complain otherwise.

I do agree this makes things a bit harder to read, but I think the
benefits are pretty measurable. And may make a real usability difference
on a large repository.