Web lists-archives.com

Re: [PATCH] [Outreachy] git/userdiff.c fix regex pattern error




Hi Ananya,

On Thu, 4 Oct 2018, Ananya Krishna Maram wrote:

> On Thu, 4 Oct 2018 at 19:56, Johannes Schindelin
> <Johannes.Schindelin@xxxxxx> wrote:
> >
> > Hi Ananya,
> >
> > thank you for taking the time to write this patch!
> >
> > On Thu, 4 Oct 2018, Ananya Krishna Maram wrote:
> >
> > > the forward slash character should be escaped with backslash. Fix
> > > Unescaped forward slash error in Python regex statements.
> > >
> > > Signed-off-by: Ananya Krishna Maram<ananyakittu1997@xxxxxxxxx>
> >
> > That explains pretty well what you did, but I wonder why the forward slash
> > needs to be escaped? I would understand if we enclosed the pattern in
> > `/<regex>/`, as it is done e.g. in Javascript, but we do not...
> 
> You are correct, the code would execute either ways. But when I came across
> this line, I didn't get it's meaning instantly because as per standards, forward
> slash has to be escaped. In fact when open source code is written according to
> standards, the code will be reachable to more people.

I am afraid that I do not follow... Regular expressions have quite a few
special characters, but forward slashes are not among them.

Meaning: if we had specified the regular expression thusly (in any
language that supports them to be specified in this way):

	/|[-+*/<>%&^|=!]=|//=?|<<=?|>>=?|\\*\\*=?/

then I would agree that this is a bug, and needs to be fixed. But we
specify it as a regular C string:

	"|[-+*/<>%&^|=!]=|//=?|<<=?|>>=?|\\*\\*=?"

In this context, the backslash has an additional, nested meaning: it
escapes special characters in a C string, too. So writing

	"\\"

will actually result in a string consisting of a single backslash. And

	"\n"

would specify a string consisting of a single line feed character. Some C
compilers ignore incorrectly-escaped characters, i.e. "\/" would simply
expand to the forward slash.

However, you wanted to escape the forward slash in the regular expression.
To do that, you would have to write

	"\\/"

i.e. specifying, in a C string, a backslash character, followed by a
forward slash character.

However, the regular expression would then be interpreting the backslash
character as escape character in its own right, seeing the forward slash,
and replacing this sequence by a forward slash.

But it does not need to be escaped, when you specify the regular
expression the way we do. And the way we specified it is really the
standard when specifying regular expressions in C code, i.e. *without* the
suggested backslash.

Ciao,
Johannes

> 
> Thanks,
> Ananya.
> 
> > Thanks,
> > Johannes
> >
> > > ---
> > >  userdiff.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/userdiff.c b/userdiff.c
> > > index f565f6731..f4ff9b9e5 100644
> > > --- a/userdiff.c
> > > +++ b/userdiff.c
> > > @@ -123,7 +123,7 @@ PATTERNS("python", "^[ \t]*((class|def)[ \t].*)$",
> > >        /* -- */
> > >        "[a-zA-Z_][a-zA-Z0-9_]*"
> > >        "|[-+0-9.e]+[jJlL]?|0[xX]?[0-9a-fA-F]+[lL]?"
> > > -      "|[-+*/<>%&^|=!]=|//=?|<<=?|>>=?|\\*\\*=?"),
> > > +      "|[-+*\/<>%&^|=!]=|\/\/=?|<<=?|>>=?|\\*\\*=?"),
> > >        /* -- */
> > >  PATTERNS("ruby", "^[ \t]*((class|module|def)[ \t].*)$",
> > >        /* -- */
> > > --
> > > 2.17.1
> > >
> > >
>