Web lists-archives.com

Re: [PATCH 00/11] Reduce pack-objects memory footprint




On Thu, Mar 01 2018, Nguyễn Thái Ngọc Duy jotted:

> The array of object_entry in pack-objects can take a lot of memory
> when pack-objects is run in "pack everything" mode. On linux-2.6.git,
> this array alone takes roughly 800MB.
>
> This series reorders some fields and reduces field size... to keep
> this struct smaller. Its size goes from 136 bytes to 96 bytes (29%) on
> 64-bit linux and saves 260MB on linux-2.6.git.

I'm very interested in this patch series. I don't have time to test this
one right now (have to run), but with your previous RFC patch memory use
(in the ~4GB range) on a big in-house repo went down by a bit over 3%,
and it's ~5% faster.

Before/after RSS 4440812 / 4290000 & runtime 172.73 / 162.45. This is
after having already done a full git gc before, data via /usr/bin/time
-v.

So not huge, but respectable.

We have a big repo, and this gets repacked on 6-8GB of memory on dev
KVMs, so we're under a fair bit of memory pressure. git-gc slows things
down a lot.

It would be really nice to have something that made it use drastically
less memory at the cost of less efficient packs. Is the property that
you need to spend give or take the size of .git/objects in memory
something inherent, or just a limitation of the current implementation?
I.e. could we do a first pass to pick some objects based on some
heuristic, then repack them N at a time, and finally delete the
now-obsolete packs?

Another thing I've dealt with is that on these machines their
NFS-mounted storage gets exhausted (I'm told) due to some pathological
operations git does during repack, I/O tends to get 5-6x slower. Of
course ionice doesn't help because the local kernel doesn't know
anything about how harmful it is.