Web lists-archives.com

Re: An appropriate directory search tool?




On Sun, Oct 21, 2018 at 08:48:28AM -0500, David Wright wrote:
> On Sun 21 Oct 2018 at 05:25:05 (-0500), Richard Owlett wrote:
> > I wish a list of files with a specific extension in a directory which
> > contain keywordA but not keywordB. Recursing down the directory tree
> > was the primary objection to the MATE search tool.
> 
> At last, a direct question!
> 
> $ grep -L keywordB $(grep -l keywordA a-directory/*extension)
> 
> Mix with quotes according to taste and needs.

That doesn't recurse (it only considers files at depth 1 in a single
subdirectory), and it falls apart on filenames with whitespace.

If we ignore the recursion part for a moment, I have a FAQ for the
"match A but not B" part:

https://mywiki.wooledge.org/BashFAQ/079

The specific example for this case (foo but NOT bar) is at the bottom:

awk '/foo/{good=1} /bar/{good=0;exit} END{exit !good}'

So, all we have to do is write the recursion and extension-filtering
parts and link them together with the awk command.  This is fairly
straightforward with the standard tools.

find . -type f -name '*.myext' -exec \
  awk '/keywordA/{good=1} /keywordB/{good=0;exit} END{exit !good}' {} \; -print


Testing:

wooledg:~$ mkdir /tmp/x && cd "$_"
wooledg:/tmp/x$ mkdir -p a/b/c a/b/d
wooledg:/tmp/x$ echo keywordA > a/b/c/good.myext
wooledg:/tmp/x$ echo keywordA keywordB > a/b/d/bad.myext
wooledg:/tmp/x$ find . -type f -name '*.myext' -exec \
>   awk '/keywordA/{good=1} /keywordB/{good=0;exit} END{exit !good}' {} \; -print
./a/b/c/good.myext


Now, the obvious unstated part of the question is that he will want
keywordA and keywordB to be passed as parameters (although knowing him,
he will require 17 messages to tell us this).

This is where it actually gets "hard", because the obvious thing to do
would be to change the quotes on the awk command and embed $1 and $2 in
it directly.  That is a TRAP.  It's a code injection bug, because the
parameters given by the user could contain code that is meaningful to awk,
which would lead to unexpected results.

For that part of the program, I refer you to:

https://mywiki.wooledge.org/BashProgramming/05

I would use the "awk variables" approach for this one:

#!/bin/sh
if test "$#" != 2; then
  printf "usage: %s goodpat badpat\n" "$0" >&2
  exit 1
fi

find . -type f -name '*.myext' -exec \
  awk -v goodpat="$1" -v badpat="$2" \
    '$0 ~ goodpat {good=1} $0 ~ badpat {good=0;exit} END{exit !good}' {} \; \
  -print


And, testing:

wooledg:/tmp/x$ set -- wordA wordB
wooledg:/tmp/x$ find . -type f -name '*.myext' -exec \
>   awk -v goodpat="$1" -v badpat="$2" \
>     '$0 ~ goodpat {good=1} $0 ~ badpat {good=0;exit} END{exit !good}' {} \; \
>   -print
./a/b/c/good.myext


And then, the obvious next extension after THAT would be to make the
filename extension a parameter.  The shell part of that one is super
easy (no code injection problems with find -name), so I won't bother
showing it.

At that point, the user interface becomes the real issue.  Do you
put the extension argument first, or last?  Do you make it an option?
Do you hardcode a default extension, or does the lack of a specified
extension mean that you drop the -name filter altogether?  Or do you
give up the command line interface entirely, and go with a Tk dialog?

But he'll never, ever, EVER be able to answer those questions, so we
won't have to worry about it.