Web lists-archives.com

Re: git grep -P fatal: pcre_exec failed with error code -8




Hello,

thanks for your answer.

I understand that the PCRE's stack can get exhausted for some files, but in such cases, git grep shall proceed with the other files, and print at the end/stderr for which files the pattern was not applied. Such behaviour would be more usefull than the current one.

Regards
  Dilian

On 11/05/2017 03:16 AM, Jeff King wrote:
On Sun, Nov 05, 2017 at 01:06:21AM +0100, Дилян Палаузов wrote:

with git 2.14.3 linked with libpcre.so.1.2.9 when I do:
   git clone https://github.com/django/django
   cd django
   git grep -P "if.*([^\s])+\s+and\s+\1"
django/contrib/admin/static/admin/js/vendor/select2/select2.full.min.js
the output is:
   fatal: pcre_exec failed with error code -8

Code -8 is PCRE_ERROR_MATCHLIMIT. And "man pcreapi" has this to say:

   The match_limit field provides a means of preventing PCRE from
   using up a vast amount of resources when running patterns that
   are not going to match, but which have a very large number of
   possibilities in their search trees. The classic example is a
   pattern that uses nested unlimited repeats.

   Internally, pcre_exec() uses a function called match(), which
   it calls repeatedly (sometimes recursively). The limit set by
   match_limit is imposed on the number of times this function is
   called during a match, which has the effect of limiting the
   amount of backtracking that can take place. For patterns that
   are not anchored, the count restarts from zero for each posi‐
   tion in the subject string.

   When pcre_exec() is called with a pattern that was successfully
   studied with a JIT option, the way that the matching is exe‐
   cuted is entirely different. However, there is still the pos‐
   sibility of runaway matching that goes on for a very long time,
   and so the match_limit value is also used in this case (but in
   a different way) to limit how long the matching can continue.

   The default value for the limit can be set when PCRE is built;
   the default default is 10 million, which handles all but the
   most extreme cases. You can override the default by suppling
   pcre_exec() with a pcre_extra block in which match_limit is
   set, and PCRE_EXTRA_MATCH_LIMIT is set in the flags field. If
   the limit is exceeded, pcre_exec() returns PCRE_ERROR_MATCH‐
   LIMIT.

So your pattern is just really expensive and is running afoul of pcre's
backtracking limits (and it's not helped by the fact that the file is
basically one giant line).

There's no way to ask Git to specify a larger match_limit to pcre, but
you might be able to construct your pattern in a way that involves less
backtracking. It looks like you're trying to find things like "if foo
and foo"?

Should the captured term actually be "([^\s]+)" (with the "+" on the
_inside_ of the capture? Or maybe I'm just misunderstanding your goal.

-Peff