Web lists-archives.com

Re: [PATCH 2/2] diffcore-pickaxe: add --pickaxe-raw-diff for use with -G

On Thu, Apr 25 2019, Eugeniu Rosca wrote:

> Hi Ævar,
> Thanks for the amazingly fast reply and for the useful feature (yay!).
> On Wed, Apr 24, 2019 at 05:37:10PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> On Wed, Apr 24 2019, Ævar Arnfjörð Bjarmason wrote:
>> > Add the ability for the -G<regex> pickaxe to search only through added
>> > or removed lines in the diff, or even through an arbitrary amount of
>> > context lines when combined with -U<n>.
>> >
>> > This has been requested[1][2] a few times in the past, and isn't
>> > currently possible. Instead users need to do -G<regex> and then write
>> > their own post-parsing script to see if the <regex> matched added or
>> > removed lines, or both. There was no way to match the adjacent context
>> > lines other than running and grepping the equivalent of a "log -p -U<n>".
>> >
>> > 1. https://public-inbox.org/git/xmqqwoqrr8y2.fsf@xxxxxxxxxxxxxxxxxxxxxxxxx/
>> > 2. https://public-inbox.org/git/20190424102609.GA19697@xxxxxxxxxxxxxxxxxxxxxx/
>> I see now once I actually read Eugeniu Rosca's E-Mail upthread instead
>> of just knee-jerk sending out patches that this doesn't actually solve
>> his particular problem fully.
>> I.e. if you want some AND/OR matching support this --pickaxe-raw-diff
>> won't give you that, but it *does* make it much easier to script up such
>> an option. Run it twice with -G"\+<regex>" and -G"-<regex>", "sort |
>> uniq -c" the commit list, and see which things occur once or twice.
>> Of course that doesn't give you more complex nested and/or cases, but if
>> git-log grew support for that like git-grep has the -G option could use
>> that, although at that point we'd probably want to spend effort on
>> making the underlying machinery smarter to avoid duplicate work.
> Purely from user's standpoint, I feel more comfortable with `git grep`
> and `git log --grep` particularly b/c they support '--all-match' [2],
> allowing more flexible multi-line searches. Based on your feedback, it
> looks to me that `git log -G/-S` did not have a chance to develop their
> features to the same level.
>> Furthermore, and quoting Eugeniu upthread:
>>     In the context of [1], I would like to find all Linux commits which
>>     replaced:
>>     	'devm_request_threaded_irq(* IRQF_SHARED *)'
>>     by:
>>     	'devm_request_threaded_irq(* IRQF_ONESHOT *)'
>> Such AND/OR machinery would give you what you wanted *most* of the time,
>> but it would also find removed/added pairs that were "unrelated" as well
>> as "related". Solving *that* problem is more complex, but something the
>> diff machinery could in principle expose.
> I expect some false positives, since git is agnostic on the language
> used to write the versioned files (the latter sounds like a research
> topic to me - I hope there is somebody willing to experiment with that
> in future).

I was thinking of something where the added/removed could be filtered to
cases that occur in the same diff hunk.

>> But the "-G<regex> --pickaxe-raw-diff" feature I have as-is is very
>> useful,
> I agree. I am a bit bothered by the fact that
> `git log --oneline -Ux -G<regex> --pickaxe-raw-diff` outputs the
> contents/patch of a commit. My expectation is that we have the
> `log -p` knob for that?

This is unrelated to --pickaxe-raw-diff, -U<n> just implies -p in
general. See e.g. "git log -U1".

>> I've had at least two people off-list ask me about a problem
>> that would be solved by it just in the last 1/2 year (unrelated to them
>> having seen the WIP patch I sent last October).
>> It's more general than Junio's suggested --pickaxe-ignore-{add,del}
> As a user, I would be happier to freely grep in the raw commit contents
> rather than learning a dozen of new options which provide small subsets
> of the same functionality. So, I personally vote for the approach taken
> by --pickaxe-raw-diff. This would also reduce the complexity of my
> current git aliases and/or allow dropping some of them altogether.
> Quite off topic, but I also needed to come up with a solution to get
> the C functions modified/touched by a git commit [3]. It is my
> understanding that --pickaxe-raw-diff can't help here and I still have
> to rely on parsing the output of `git log -p`?

Yeah, it doesn't help with that. When it runs we haven't generated the
context line or the "@@" line yet, that's later. You can breakpoint on
xdl_format_hunk_hdr and diffgrep_consume to see it in action.

It's a waste of CPU to generate that for all possible hunks, most of
which we won't show at all.

But it's of course possible to do so by running the full diff machinery
over every commit and matching on the result, the current pickaxe is
just taking shortcuts and not doing that.

>> options[1], but those could be implemented in terms of this underlying
>> code if anyone cared to have those as aliases. You'd just take the
>> -G<regex> and prefix the <regex> with "^\+" or "^-" as appropriate and
>> turn on the DIFF_PICKAXE_G_RAW_DIFF flag.
>> 1. https://public-inbox.org/git/xmqqwoqrr8y2.fsf@xxxxxxxxxxxxxxxxxxxxxxxxx/
> Thanks!
> [2] https://gitster.livejournal.com/30195.html
> [3] https://stackoverflow.com/questions/50707171/how-to-get-all-c-functions-modified-by-a-git-commit