Re: [PATCH/RFC 1/1] gc --auto: exclude the largest giant pack in low-memory config
- Date: Mon, 05 Mar 2018 15:00:16 +0100
- From: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>
- Subject: Re: [PATCH/RFC 1/1] gc --auto: exclude the largest giant pack in low-memory config
On Thu, Mar 01 2018, Nguyễn Thái Ngọc Duy jotted:
> pack-objects could be a big memory hog especially on large repos,
> everybody knows that. The suggestion to stick a .keep file on the
> largest pack to avoid this problem is also known for a long time.
> Let's do the suggestion automatically instead of waiting for people to
> come to Git mailing list and get the advice. When a certain condition
> is met, gc --auto create a .keep file temporary before repack is run,
> then remove it afterward.
> gc --auto does this based on an estimation of pack-objects memory
> usage and whether that fits in one third of system memory (the
> assumption here is for desktop environment where there are many other
> applications running).
> Since the estimation may be inaccurate and that 1/3 threshold is
> arbitrary, give the user a finer control over this mechanism as well:
> if the largest pack is larger than gc.bigPackThreshold, it's kept.
This is very promising. Saves lots of memory on my ad-hoc testing of
adding a *.keep file on an in-house repo.
> + if (big_pack_threshold)
> + return pack->pack_size >= big_pack_threshold;
> + /* First we have to scan through at least one pack */
> + mem_want = pack->pack_size + pack->index_size;
> + /* then pack-objects needs lots more for book keeping */
> + mem_want += sizeof(struct object_entry) * nr_objects;
> + /*
> + * internal rev-list --all --objects takes up some memory too,
> + * let's say half of it is for blobs
> + */
> + mem_want += sizeof(struct blob) * nr_objects / 2;
> + /*
> + * and the other half is for trees (commits and tags are
> + * usually insignificant)
> + */
> + mem_want += sizeof(struct tree) * nr_objects / 2;
> + /* and then obj_hash, underestimated in fact */
> + mem_want += sizeof(struct object *) * nr_objects;
> + /*
> + * read_sha1_file() (either at delta calculation phase, or
> + * writing phase) also fills up the delta base cache
> + */
> + mem_want += delta_base_cache_limit;
> + /* and of course pack-objects has its own delta cache */
> + mem_want += max_delta_cache_size;
I'm not familiar enough with this part to say, but isn't this assuming a
lot about the distribution of objects in a way that will cause is not to
repack in some pathological cases?
Probably worth documenting...
> + /* Only allow 1/3 of memory for pack-objects */
> + mem_have = total_ram() / 3;
Would be great to have this be a configurable variable, so you could set
it to e.g. 33% (like here), 50% etc.