Web lists-archives.com

Re: [PATCH 1/2] archive: replace write_or_die() calls with write_block_or_die()

On Fri, Apr 12, 2019 at 04:04:39PM -0700, Rohit Ashiwal via GitGitGadget wrote:

> From: Rohit Ashiwal <rohit.ashiwal265@xxxxxxxxx>
> MinGit for Windows comes without `gzip` bundled inside, git-archive uses
> `gzip -cn` to compress tar files but for this to work, gzip needs to be
> present on the host system.
> In the next commit, we will change the gzip compression so that we no
> longer spawn `gzip` but let zlib perform the compression in the same
> process instead.
> In preparation for this, we consolidate all the block writes into a
> single function.

Sounds like a good preparatory step. This part confused me, though:

> @@ -38,11 +40,21 @@ static int write_tar_filter_archive(const struct archiver *ar,
>  #define USTAR_MAX_MTIME 077777777777ULL
>  #endif
> +/* writes out the whole block, or dies if fails */
> +static void write_block_or_die(const char *block) {
> +	if (gzip) {
> +		if (gzwrite(gzip, block, (unsigned) BLOCKSIZE) != BLOCKSIZE)
> +			die(_("gzwrite failed"));
> +	} else {
> +		write_or_die(1, block, BLOCKSIZE);
> +	}
> +}

What is gzwrite()? At first I thought this was an out-of-sequence bit of
the series, but it turns out that this is a zlib.h interface. So the
idea (I think) is that here we introduce a "gzip" variable that is
always false, and this first conditional arm is effectively dead code.
And then in a later patch we'd set up "gzip" and it would become

I think it would be less confusing if this just factored out
write_block_or_die(), which starts as a thin wrapper and then grows the
gzip parts in the next patch.

A few nits on the code itself:

> +static gzFile gzip;
> [...]
> +       if (gzip) {

Is it OK for us to ask about the truthiness of this opaque type? That
works if it's really a pointer behind the scenes, but it seems like it
would be equally OK for zlib to declare it as a struct.

It looks OK in my version of zlib, and that library tends to be fairly
conservative so I wouldn't be surprised if it was that way back to the
beginning and remains that way for eternity. But it feels like a bad

> +		if (gzwrite(gzip, block, (unsigned) BLOCKSIZE) != BLOCKSIZE)

This cast is interesting. All of the matching write_or_die() calls are
promoting it to a size_t, which is also unsigned.

BLOCKSIZE is a constant. Should we be defining it with a "U" in the first place?

I doubt it matters much either way from a correctness perspective. I
just wonder when I see a cast like that if we're going to get unexpected
truncation or similar.