Web lists-archives.com

Re: An appropriate directory search tool?




On Mon 22 Oct 2018 at 09:09:12 (-0400), Greg Wooledge wrote:
> On Sun, Oct 21, 2018 at 08:48:28AM -0500, David Wright wrote:
> > On Sun 21 Oct 2018 at 05:25:05 (-0500), Richard Owlett wrote:
> > > I wish a list of files with a specific extension in a directory which
> > > contain keywordA but not keywordB. Recursing down the directory tree
> > > was the primary objection to the MATE search tool.
                      ↑↑↑↑↑↑↑↑↑
> > 
> > At last, a direct question!
> > 
> > $ grep -L keywordB $(grep -l keywordA a-directory/*extension)
> > 
> > Mix with quotes according to taste and needs.
> 
> That doesn't recurse (it only considers files at depth 1 in a single
> subdirectory),

Specifically required by the OP.

> and it falls apart on filenames with whitespace.

Left as an exercise for the reader.

> If we ignore the recursion part for a moment, I have a FAQ for the
> "match A but not B" part:
> 
> https://mywiki.wooledge.org/BashFAQ/079
> 
> The specific example for this case (foo but NOT bar) is at the bottom:
> 
> awk '/foo/{good=1} /bar/{good=0;exit} END{exit !good}'
> 
> So, all we have to do is write the recursion and extension-filtering
> parts and link them together with the awk command.  This is fairly
> straightforward with the standard tools.
> 
> find . -type f -name '*.myext' -exec \
>   awk '/keywordA/{good=1} /keywordB/{good=0;exit} END{exit !good}' {} \; -print
> 
> 
> Testing:
> 
> wooledg:~$ mkdir /tmp/x && cd "$_"
> wooledg:/tmp/x$ mkdir -p a/b/c a/b/d
> wooledg:/tmp/x$ echo keywordA > a/b/c/good.myext
> wooledg:/tmp/x$ echo keywordA keywordB > a/b/d/bad.myext
> wooledg:/tmp/x$ find . -type f -name '*.myext' -exec \
> >   awk '/keywordA/{good=1} /keywordB/{good=0;exit} END{exit !good}' {} \; -print
> ./a/b/c/good.myext
> 
> 
> Now, the obvious unstated part of the question is that he will want
> keywordA and keywordB to be passed as parameters (although knowing him,
> he will require 17 messages to tell us this).
> 
> This is where it actually gets "hard", because the obvious thing to do
> would be to change the quotes on the awk command and embed $1 and $2 in
> it directly.  That is a TRAP.  It's a code injection bug, because the
> parameters given by the user could contain code that is meaningful to awk,
> which would lead to unexpected results.
> 
> For that part of the program, I refer you to:
> 
> https://mywiki.wooledge.org/BashProgramming/05
> 
> I would use the "awk variables" approach for this one:
> 
> #!/bin/sh
> if test "$#" != 2; then
>   printf "usage: %s goodpat badpat\n" "$0" >&2
>   exit 1
> fi
> 
> find . -type f -name '*.myext' -exec \
>   awk -v goodpat="$1" -v badpat="$2" \
>     '$0 ~ goodpat {good=1} $0 ~ badpat {good=0;exit} END{exit !good}' {} \; \
>   -print
> 
> 
> And, testing:
> 
> wooledg:/tmp/x$ set -- wordA wordB
> wooledg:/tmp/x$ find . -type f -name '*.myext' -exec \
> >   awk -v goodpat="$1" -v badpat="$2" \
> >     '$0 ~ goodpat {good=1} $0 ~ badpat {good=0;exit} END{exit !good}' {} \; \
> >   -print
> ./a/b/c/good.myext
> 
> 
> And then, the obvious next extension after THAT would be to make the
> filename extension a parameter.  The shell part of that one is super
> easy (no code injection problems with find -name), so I won't bother
> showing it.
> 
> At that point, the user interface becomes the real issue.  Do you
> put the extension argument first, or last?  Do you make it an option?
> Do you hardcode a default extension, or does the lack of a specified
> extension mean that you drop the -name filter altogether?  Or do you
> give up the command line interface entirely, and go with a Tk dialog?
> 
> But he'll never, ever, EVER be able to answer those questions, so we
> won't have to worry about it.

No, but we all learn something from these posts; at least, I do.

Cheers,
David.