Web lists-archives.com

Re: [PATCH] t5534: fix misleading grep invocation




Hi Michael,

On Thu, 6 Jul 2017, Michael J Gruber wrote:

> Junio C Hamano venit, vidit, dixit 05.07.2017 18:26:
> > Johannes Schindelin <johannes.schindelin@xxxxxx> writes:
> > 
> >> It seems to be a little-known feature of `grep` (and it certainly came
> >> as a surprise to this here developer who believed to know the Unix tools
> >> pretty well) that multiple patterns can be passed in the same
> >> command-line argument simply by separating them by newlines. Watch, and
> >> learn:
> >>
> >> 	$ printf '1\n2\n3\n' | grep "$(printf '1\n3\n')"
> >> 	1
> >> 	3
> >>
> >> That behavior also extends to patterns passed via `-e`, and it is not
> >> modified by passing the option `-E` (but trying this with -P issues the
> >> error "grep: the -P option only supports a single pattern").
> >>
> >> It seems that there are more old Unix hands who are surprised by this
> >> behavior, as grep invocations of the form
> >>
> >> 	grep "$(git rev-parse A B) C" file
> >>
> >> were introduced in a85b377d041 (push: the beginning of "git push
> >> --signed", 2014-09-12), and later faithfully copy-edited in b9459019bbb
> >> (push: heed user.signingkey for signed pushes, 2014-10-22).
> >>
> >> Please note that the output of `git rev-parse A B` separates the object
> >> IDs via *newlines*, not via spaces, and those newlines are preserved
> >> because the interpolation is enclosed in double quotes.
> >>
> >> As a consequence, these tests try to validate that the file contains
> >> either A's object ID, or B's object ID followed by C, or both. Clearly,
> >> however, what the test wanted to see is that there is a line that
> >> contains all of them.
> >>
> >> This is clearly unintended, and the grep invocations in question really
> >> match too many lines.
>
> [...]
>
> How did you spot this? Are there grep versions that behave differently?

Yes, there are grep versions that behave differently... how did you guess?

I am in the middle of an extended investigation trying to assess how
feasible it would be to use a native Win32 port of BusyBox (started by
long-time Git contributor Nguyễn Thái Ngọc Duy) in Git for Windows to
execute the many, many remaining Unix shell scripts that are a core part
of Git (including crucial functionality such as bisect, rebase, stash and
submodule, for which we suffer portability and performance problems).

And it is BusyBox' grep that does not handle newlines in the pattern
argument to split it into two alternative patterns.

I first considered patching BusyBox to adhere to the expected behavior,
but then I looked closer and saw that the test's grep invocations actually
matched two lines instead of what I expected. An even closer look made me
suspect that the original intention was different from what the script
actually does, and for once I tried to be nice in my commit message.

Ciao,
Dscho