Re: [PATCH 00/11] Reduce pack-objects memory footprint
- Date: Fri, 2 Mar 2018 05:57:10 -0500
- From: Jeff King <peff@xxxxxxxx>
- Subject: Re: [PATCH 00/11] Reduce pack-objects memory footprint
On Fri, Mar 02, 2018 at 07:14:01AM +0700, Duy Nguyen wrote:
> > We have a big repo, and this gets repacked on 6-8GB of memory on dev
> > KVMs, so we're under a fair bit of memory pressure. git-gc slows things
> > down a lot.
> > It would be really nice to have something that made it use drastically
> > less memory at the cost of less efficient packs. Is the property that
> Ahh.. less efficient. You may be more interested in  then. It
> avoids rewriting the base pack. Without the base pack, book keeping
> becomes much much cheaper.
> We still read every single byte in all packs though (I think, unless
> you use pack-bitmap) and this amount of I/O affect the rest of the
> system too. Perhaps reducing core.packedgitwindowsize might make it
> friendlier to the OS, I don't know.
Yes, the ".keep" thing is actually quite expensive. We still do a
complete rev-list to find all the objects we want, and then for each
object say "is this in a pack with .keep?". And worse, the mru doesn't
help there because even if we find it in the first pack, we have to keep
looking to see if it's _another_ pack.
There are probably some low-hanging optimizations there (e.g., only
looking in the .keep packs if that's all we're looking for; we may even
do that already).
But I think fundamentally you'd do much better to generate the partial
list of objects outside of pack-objects entirely, and then just feed it
to pack-objects without using "--revs".