Re: [PATCH 2/2] archive: avoid spawning `gzip`
- Date: Mon, 15 Apr 2019 17:35:56 -0400
- From: Jeff King <peff@xxxxxxxx>
- Subject: Re: [PATCH 2/2] archive: avoid spawning `gzip`
On Sun, Apr 14, 2019 at 12:01:10AM +0200, René Scharfe wrote:
> >> As we already link to the zlib library, we can perform the compression
> >> without even requiring gzip on the host machine.
> > Very cool. It's nice to drop a dependency, and this should be a bit more
> > efficient, too.
> Getting rid of dependencies is good, and using zlib is the obvious way to
> generate .tgz files. Last time I tried something like that, a separate gzip
> process was faster, though -- at least on Linux . How does this one
I'd expect a separate gzip to be faster in wall-clock time for a
multi-core machine, but overall consume more CPU. I'm slightly surprised
that your timings show that it actually wins on total CPU, too.
Here are best-of-five times for "git archive --format=tar.gz HEAD" on
linux.git (the machine is a quad-core):
[before, separate gzip]
[after, internal gzwrite]
which does show what I expect (longer overall, but less total CPU).
Which one you prefer depends on your situation, of course. A user on a
workstation with multiple cores probably cares most about end-to-end
latency and using all of their available horsepower. A server hosting
repositories and receiving many unrelated requests probably cares more
about total CPU (though the differences there are small enough that it
may not even be worth having a config knob to un-parallelize it).
> Doing compression in its own thread may be a good idea.
Yeah. It might even make the patch simpler, since I'd expect it to be
implemented with start_async() and a descriptor, making it look just
like a gzip pipe to the caller. :)