Web lists-archives.com

Re: [PATCH] wildmatch: change behavior of "foo**bar" in WM_PATHNAME mode




On Sat, Oct 27 2018, Nguyễn Thái Ngọc Duy wrote:

> In WM_PATHNAME mode (or FNM_PATHNAME), '*' does not match '/' and '**'
> can but only in three patterns:
>
> - '**/' matches zero or more leading directories
> - '/**/' matches zero or more directories in between
> - '/**' matches zero or more trailing directories/files
>
> When '**' is present but not in one of these patterns, the current
> behavior is consider the pattern invalid and stop matching. In other
> words, 'foo**bar' never matches anything, whatever you throw at it.
>
> This behavior is arguably a bit confusing partly because we can't
> really tell the user their pattern is invalid so that they can fix
> it. So instead, tolerate it and make '**' act like two regular '*'s
> (which is essentially the same as a single asterisk). This behavior
> seems more predictable.
>
> Noticed-by: dana <dana@xxxxxxx>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx>
> ---
>  Documentation/gitignore.txt | 3 ++-
>  t/t3070-wildmatch.sh        | 4 ++--
>  wildmatch.c                 | 4 ++--
>  wildmatch.h                 | 1 -
>  4 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/gitignore.txt b/Documentation/gitignore.txt
> index d107daaffd..1c94f08ff4 100644
> --- a/Documentation/gitignore.txt
> +++ b/Documentation/gitignore.txt
> @@ -129,7 +129,8 @@ full pathname may have special meaning:
>     matches zero or more directories. For example, "`a/**/b`"
>     matches "`a/b`", "`a/x/b`", "`a/x/y/b`" and so on.
>
> - - Other consecutive asterisks are considered invalid.
> + - Other consecutive asterisks are considered regular asterisks and
> +   will match according to the previous rules.
>
>  NOTES
>  -----
> diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh
> index 46aca0af10..891d4d7cb9 100755
> --- a/t/t3070-wildmatch.sh
> +++ b/t/t3070-wildmatch.sh
> @@ -237,7 +237,7 @@ match 0 0 0 0 foobar 'foo\*bar'
>  match 1 1 1 1 'f\oo' 'f\\oo'
>  match 1 1 1 1 ball '*[al]?'
>  match 0 0 0 0 ten '[ten]'
> -match 0 0 1 1 ten '**[!te]'
> +match 1 1 1 1 ten '**[!te]'
>  match 0 0 0 0 ten '**[!ten]'
>  match 1 1 1 1 ten 't[a-g]n'
>  match 0 0 0 0 ten 't[!a-g]n'
> @@ -253,7 +253,7 @@ match 1 1 1 1 ']' ']'
>  # Extended slash-matching features
>  match 0 0 1 1 'foo/baz/bar' 'foo*bar'
>  match 0 0 1 1 'foo/baz/bar' 'foo**bar'
> -match 0 0 1 1 'foobazbar' 'foo**bar'
> +match 1 1 1 1 'foobazbar' 'foo**bar'
>  match 1 1 1 1 'foo/baz/bar' 'foo/**/bar'
>  match 1 1 0 0 'foo/baz/bar' 'foo/**/**/bar'
>  match 1 1 1 1 'foo/b/a/z/bar' 'foo/**/bar'
> diff --git a/wildmatch.c b/wildmatch.c
> index d074c1be10..9e9e2a2f95 100644
> --- a/wildmatch.c
> +++ b/wildmatch.c
> @@ -104,8 +104,8 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
>  					    dowild(p + 1, text, flags) == WM_MATCH)
>  						return WM_MATCH;
>  					match_slash = 1;
> -				} else
> -					return WM_ABORT_MALFORMED;
> +				} else /* WM_PATHNAME is set */
> +					match_slash = 0;
>  			} else
>  				/* without WM_PATHNAME, '*' == '**' */
>  				match_slash = flags & WM_PATHNAME ? 0 : 1;
> diff --git a/wildmatch.h b/wildmatch.h
> index b8c826aa68..5993696298 100644
> --- a/wildmatch.h
> +++ b/wildmatch.h
> @@ -4,7 +4,6 @@
>  #define WM_CASEFOLD 1
>  #define WM_PATHNAME 2
>
> -#define WM_ABORT_MALFORMED 2
>  #define WM_NOMATCH 1
>  #define WM_MATCH 0
>  #define WM_ABORT_ALL -1

This patch looks good to me, but I think it's a bad state of affairs to
keep changing these semantics and not having something like a
"gitwildmatch" doc were we document this matching syntax.

Also I still need to dig up the work for using PCRE as an alternate
matching engine, the PCRE devs produced a bug-for-bug compatible version
of our wildmatch function (all the more reason to document it), so I
think they'll need to change it now that this is in, but I haven't
rebased those ancient patches yet.

Do you have any thoughts on how to proceed with getting this documented
/ into some stable state where we can specify it? Even if we don't end
up using PCRE as a matching engine (sometimes it was faster, sometimes
slower) I think it would be very useful if we can spew out "here's your
pattern as a regex" for self-documentation purposes.

Then that can be piped into e.g. "perl -Mre=debug" to see a step-by-step
guide for how the pattern compiles, and why it does or doesn't match a
given thing.