Web lists-archives.com

Re: [PATCH] diff: add support for reading files literally with --no-index




On Thu, Dec 20, 2018 at 12:26:10AM +0000, brian m. carlson wrote:

> In some shells, such as bash and zsh, it's possible to use a command
> substitution to provide the output of a command as a file argument to
> another process, like so:
> 
>   diff -u <(printf "a\nb\n") <(printf "a\nc\n")
> 
> However, this syntax does not produce useful results with git diff
> --no-index.
> 
> On macOS, the arguments to the command are named pipes under /dev/fd,
> and git diff doesn't know how to handle a named pipe. On Linux, the
> arguments are symlinks to pipes, so git diff "helpfully" diffs these
> symlinks, comparing their targets like "pipe:[1234]" and "pipe:[5678]".
> 
> Because this behavior is not very helpful, and because git diff has many
> features that people would like to use even on non-Git files, add an
> option to git diff --no-index to read files literally, dereferencing
> symlinks and reading them as a normal file.
> 
> Note that this behavior requires that the files be read entirely into
> memory, just as we do when reading from standard input.

Yeah, I think this is a useful feature to have. It seemed familiar, and
indeed there were some patches a while back:

  https://public-inbox.org/git/20170113102021.6054-1-dennis@xxxxxxxxxxxxxxx/

Compared to that series, the implementation here seems simpler and less
likely to have unexpected effects.

Reading the patch, one thing I _thought_ it did (but it does not) is to
impact only the actual arguments on the command line, not entries we
find from recursing trees.

I.e., if I said:

  ln -s bar a/foo
  ln -s baz b/foo
  git diff --no-index --literally a/foo b/foo
  git diff --no-index --literally a/ b/

should I still see a/foo as a symlink in the second one, or not?

The distinction is a bit subtle, but I think treating only the actual
top-level arguments as symlinks would solve your problem, but still
allow a more detailed diff for the recursive cases.

There's some more discussion on this in the earlier iteration of that
series:

  https://public-inbox.org/git/20161111201958.2175-1-dennis@xxxxxxxxxxxxxxx/

I also suspect people might want a config option to make this the
default (which should be OK, as git-diff is, after all, porcelain). But
either way, a command line option is the first step.

> Signed-off-by: brian m. carlson <sandals@xxxxxxxxxxxxxxxxxxxx>
> ---
> This is a long-standing annoyance of mine, and it also makes some use
> cases possible (for example, diffing filtered and non-filtered objects).
> 
> We don't include a test for the pipe scenario because I couldn't get
> that case to work in portable shell (although of course it works in
> bash). I have, however, tested it on both macOS and Linux. No clue how
> this works on Windows.

I think focusing on symlinks makes sense. You could probably test pipes
with mkfifo, but looking at your implementation, it's pretty clear that
it will operate on anything that works with read().

> diff --git a/Documentation/git-diff.txt b/Documentation/git-diff.txt
> index 030f162f30..4c4695c88d 100644
> --- a/Documentation/git-diff.txt
> +++ b/Documentation/git-diff.txt
> @@ -111,6 +111,11 @@ include::diff-options.txt[]
>  	"Unmerged".  Can be used only when comparing the working tree
>  	with the index.
>  
> +--literally::
> +  Read the specified files literally, as `diff` would,
> +  dereferencing any symlinks and reading data from pipes.
> +  This option only works with `--no-index`.
> +

Looks like spaces for indent, whereas the context uses tabs.

I think "literal" is a good way to describe this concept. If we do grow
a config option to make this the default, then countermanding it would
be "--no-literally", which parses a bit funny as English.

If you agree with my "only the top-level" line of reasoning above, maybe
"--literal-arguments" and "--no-literal-arguments" might make sense.

We could also in theory offer several levels: no literals, top-level
literals, everything literal, at which point it becomes a tri-state.

> -static struct diff_filespec *noindex_filespec(const char *name, int mode)
> +static int populate_literally(struct diff_filespec *s)
> +{
> +	struct strbuf buf = STRBUF_INIT;
> +	size_t size = 0;
> +	int fd = xopen(s->path, O_RDONLY);
> +
> +	if (strbuf_read(&buf, fd, 0) < 0)
> +		return error_errno("error while reading from '%s'", s->path);
> +
> +	s->should_munmap = 0;
> +	s->data = strbuf_detach(&buf, &size);
> +	s->size = size;
> +	s->should_free = 1;
> +	s->read_literally = 1;
> +	return 0;
> +}
> +
> +static struct diff_filespec *noindex_filespec(const char *name, int mode,
> +					      struct diff_options *o)
> [...]
> +	else if (o->flags.read_literally)
> +		populate_literally(s);

The implementation is pleasantly simple. If we want to do the "only
top-level" thing, we'd just have to pass in a depth flag here.

> diff --git a/diff.c b/diff.c
> index dc9965e836..740d0087b9 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -4282,18 +4282,18 @@ static void run_diff_cmd(const char *pgm,
>  		fprintf(o->file, "* Unmerged path %s\n", name);
>  }
>  
> -static void diff_fill_oid_info(struct diff_filespec *one, struct index_state *istate)
> +static void diff_fill_oid_info(struct diff_filespec *one, struct diff_options *o)

It might be worth breaking these "pass the options around" hunks into a
separate preparatory patch.

-Peff