Web lists-archives.com

Re: [WIP RFC 3/5] upload-pack: refactor reading of pack-objects out




Jonathan Tan <jonathantanmy@xxxxxxxxxx> writes:

> +struct output_state {
> +	char buffer[8193];
> +	int used;
> +};
> +
> +static int read_pack_objects_stdout(int outfd, struct output_state *os)
> +{

The naming feels quite odd.  We are downstream of pack-objects and
reading its standard output stream, so "read stdout-of-pack-objects"
is not wrong per-se, but it may be just me.  Stepping back and what
the function is really about often helps us to understand what we
are doing.

I think the function you extracted from the existing logic is
responsible for relaying the data read from pack-objects to the
fetch-pack client on the other side of the wire.  So from that point
of view, singling out "read" as if it is a primary thing the
function does is already suboptimal.  Perhaps

	static int relay_pack_data(int in, struct output_state *os)

because the fd is what we "read" from (hence, 'in', not 'out'), and
relaying is what we do and reading is only one half of doing so?

> +	/* Data ready; we keep the last byte to ourselves
> +	 * in case we detect broken rev-list, so that we
> +	 * can leave the stream corrupted.  This is
> +	 * unfortunate -- unpack-objects would happily
> +	 * accept a valid packdata with trailing garbage,
> +	 * so appending garbage after we pass all the
> +	 * pack data is not good enough to signal
> +	 * breakage to downstream.
> +	 */

Yeah, I recall writing this funky and unfortunate logic in 2006.
Perhaps we want to update the comment style to make it a bit more
modern?

The "Data ready;" part of the comment applies more to what the
caller of this logic does (i.e. poll returned and revents indicates
we can read, and that is why we are calling into this logic).  The
remainder is the comment that is relevat to this logic.

> +	ssize_t readsz;
> +
> +	readsz = xread(outfd, os->buffer + os->used,
> +		       sizeof(os->buffer) - os->used);

So we read what we could read in the remaining buffer.

I notice that you alloated 8k+1 bytes; would this code end up
reading that 8k+1 bytes in full under the right condition?  I am
mainly wondering where the need for +1 comes from.

> +	if (readsz < 0) {
> +		return readsz;
> +	}

OK, report an error to the caller by returning negative.  I am
guessing that you'd have more code in this block that currently have
only one statement in the future steps, perhaps.

> +	os->used += readsz;
> +
> +	if (os->used > 1) {
> +		send_client_data(1, os->buffer, os->used - 1);
> +		os->buffer[0] = os->buffer[os->used - 1];

OK, this corresponds to the "*cp++ = buffered"  in the original just
before xread().

> +		os->used = 1;
> +	} else {
> +		send_client_data(1, os->buffer, os->used);
> +		os->used = 0;

I am not sure if the code is correct when os->used happens to be 1
(shouldn't we hold the byte, skip the call to send_client_data(),
and go back to poll() to expect more data?), but this is a faithful
code movement and rewrite of the original.

I've read the caller (below, snipped) and the conversion looked
sensible there as well.  The final flushing of the byte(s) held back
in *os also looks good.