Re: [GSoC][RFC] Proposal: Make pack access code thread-safe
- Date: Mon, 8 Apr 2019 08:23:26 +0700
- From: Duy Nguyen <pclouds@xxxxxxxxx>
- Subject: Re: [GSoC][RFC] Proposal: Make pack access code thread-safe
On Mon, Apr 8, 2019 at 5:52 AM Christian Couder
> > Git has a very optimized mechanism to compactly store
> > objects (blobs, trees, commits, etc.) in packfiles. These files are
> > created by:
> > 1. listing objects;
> > 2. sorting the list with some good heuristics;
> > 3. traversing the list with a sliding window to find similar objects in
> > the window, in order to do delta decomposing;
> > 4. compress the objects with zlib and write them to the packfile.
> > What we are calling pack access code in this document, is the set of
> > functions responsible for retrieving the objects stored at the
> > packfiles. This process consists, roughly speaking, in three parts:
> > 1. Locate and read the blob from packfile, using the index file;
> > 2. If the blob is a delta, locate and read the base object to apply the
> > delta on top of it;
> > 3. Once the full content is read, decompress it (using zlib inflate).
> > Note: There is a delta cache for the second step so that if another
> > delta depends on the same base object, it is already in memory. This
> > cache is global; also, the sliding windows, are global per packfile.
> Yeah, but the sliding windows are used only when creating pack files,
> not when reading them, right?
These windows are actually for reading. We used to just mmap the whole
pack file in the early days but that was impossible for 4+ GB packs on
32-bit platforms, which was one of the reasons, I think, that sliding
windows were added, to map just the parts we want to read.
> > # Points to work on
> > * Investigate pack access call chains and look for non-thread-safe
> > operations on then.
> > * Protect packfile.c read-and-write global variables, such as
> > pack_open_windows, pack_open_fds and etc., using mutexes.
> Do you want to work on making both packfile reading and packfile
> writing thread safe? Or just packfile reading?
Packfile writing is probably already or pretty close to thread-safe
(at least the main writing code path in git-pack-objects; the
streaming blobs to a pack, i'm not so sure).