Web lists-archives.com

Re: [GSoC][RFC] Proposal: Make pack access code thread-safe




On Mon, Apr 8, 2019 at 5:52 AM Christian Couder
<christian.couder@xxxxxxxxx> wrote:
> > Git has a very optimized mechanism to compactly store
> > objects (blobs, trees, commits, etc.) in packfiles[2]. These files are
> > created by[3]:
> >
> > 1. listing objects;
> > 2. sorting the list with some good heuristics;
> > 3. traversing the list with a sliding window to find similar objects in
> > the window, in order to do delta decomposing;
> > 4. compress the objects with zlib and write them to the packfile.
> >
> > What we are calling pack access code in this document, is the set of
> > functions responsible for retrieving the objects stored at the
> > packfiles. This process consists, roughly speaking, in three parts:
> >
> > 1. Locate and read the blob from packfile, using the index file;
> > 2. If the blob is a delta, locate and read the base object to apply the
> > delta on top of it;
> > 3. Once the full content is read, decompress it (using zlib inflate).
> >
> > Note: There is a delta cache for the second step so that if another
> > delta depends on the same base object, it is already in memory. This
> > cache is global; also, the sliding windows, are global per packfile.
>
> Yeah, but the sliding windows are used only when creating pack files,
> not when reading them, right?

These windows are actually for reading. We used to just mmap the whole
pack file in the early days but that was impossible for 4+ GB packs on
32-bit platforms, which was one of the reasons, I think, that sliding
windows were added, to map just the parts we want to read.

> > # Points to work on
> >
> > * Investigate pack access call chains and look for non-thread-safe
> > operations on then.
> > * Protect packfile.c read-and-write global variables, such as
> > pack_open_windows, pack_open_fds and etc., using mutexes.
>
> Do you want to work on making both packfile reading and packfile
> writing thread safe? Or just packfile reading?

Packfile writing is probably already or pretty close to thread-safe
(at least the main writing code path in git-pack-objects; the
streaming blobs to a pack, i'm not so sure).
-- 
Duy