Web lists-archives.com

Re: git grep -P fatal: pcre_exec failed with error code -8




On Sun, Nov 05, 2017 at 01:06:21AM +0100, Дилян Палаузов wrote:

> with git 2.14.3 linked with libpcre.so.1.2.9 when I do:
>   git clone https://github.com/django/django
>   cd django
>   git grep -P "if.*([^\s])+\s+and\s+\1"
> django/contrib/admin/static/admin/js/vendor/select2/select2.full.min.js
> the output is:
>   fatal: pcre_exec failed with error code -8

Code -8 is PCRE_ERROR_MATCHLIMIT. And "man pcreapi" has this to say:

  The match_limit field provides a means of preventing PCRE from
  using up a vast amount of resources when running patterns that
  are not going to match, but which have a very large number of
  possibilities in their search trees. The classic example is a
  pattern that uses nested unlimited repeats.

  Internally, pcre_exec() uses a function called match(), which
  it calls repeatedly (sometimes recursively). The limit set by
  match_limit is imposed on the number of times this function is
  called during a match, which has the effect of limiting the
  amount of backtracking that can take place. For patterns that
  are not anchored, the count restarts from zero for each posi‐
  tion in the subject string.

  When pcre_exec() is called with a pattern that was successfully
  studied with a JIT option, the way that the matching is exe‐
  cuted is entirely different. However, there is still the pos‐
  sibility of runaway matching that goes on for a very long time,
  and so the match_limit value is also used in this case (but in
  a different way) to limit how long the matching can continue.

  The default value for the limit can be set when PCRE is built;
  the default default is 10 million, which handles all but the
  most extreme cases. You can override the default by suppling
  pcre_exec() with a pcre_extra block in which match_limit is
  set, and PCRE_EXTRA_MATCH_LIMIT is set in the flags field. If
  the limit is exceeded, pcre_exec() returns PCRE_ERROR_MATCH‐
  LIMIT.

So your pattern is just really expensive and is running afoul of pcre's
backtracking limits (and it's not helped by the fact that the file is
basically one giant line).

There's no way to ask Git to specify a larger match_limit to pcre, but
you might be able to construct your pattern in a way that involves less
backtracking. It looks like you're trying to find things like "if foo
and foo"?

Should the captured term actually be "([^\s]+)" (with the "+" on the
_inside_ of the capture? Or maybe I'm just misunderstanding your goal.

-Peff