Web lists-archives.com

Re: [PATCH] [Outreachy] git/userdiff.c fix regex pattern error




On Thu, 4 Oct 2018 at 20:56, Johannes Schindelin
<Johannes.Schindelin@xxxxxx> wrote:
>
> Hi Ananya,
>
> On Thu, 4 Oct 2018, Ananya Krishna Maram wrote:
>
> > On Thu, 4 Oct 2018 at 19:56, Johannes Schindelin
> > <Johannes.Schindelin@xxxxxx> wrote:
> > >
> > > Hi Ananya,
> > >
> > > thank you for taking the time to write this patch!
> > >
> > > On Thu, 4 Oct 2018, Ananya Krishna Maram wrote:
> > >
> > > > the forward slash character should be escaped with backslash. Fix
> > > > Unescaped forward slash error in Python regex statements.
> > > >
> > > > Signed-off-by: Ananya Krishna Maram<ananyakittu1997@xxxxxxxxx>
> > >
> > > That explains pretty well what you did, but I wonder why the forward slash
> > > needs to be escaped? I would understand if we enclosed the pattern in
> > > `/<regex>/`, as it is done e.g. in Javascript, but we do not...
> >
> > You are correct, the code would execute either ways. But when I came across
> > this line, I didn't get it's meaning instantly because as per standards, forward
> > slash has to be escaped. In fact when open source code is written according to
> > standards, the code will be reachable to more people.
>
> I am afraid that I do not follow... Regular expressions have quite a few
> special characters, but forward slashes are not among them.
>
> Meaning: if we had specified the regular expression thusly (in any
> language that supports them to be specified in this way):
>
>         /|[-+*/<>%&^|=!]=|//=?|<<=?|>>=?|\\*\\*=?/
>
> then I would agree that this is a bug, and needs to be fixed. But we
> specify it as a regular C string:
>
>         "|[-+*/<>%&^|=!]=|//=?|<<=?|>>=?|\\*\\*=?"
>
> In this context, the backslash has an additional, nested meaning: it
> escapes special characters in a C string, too. So writing
>
>         "\\"
>
> will actually result in a string consisting of a single backslash. And
>
>         "\n"
>
> would specify a string consisting of a single line feed character. Some C
> compilers ignore incorrectly-escaped characters, i.e. "\/" would simply
> expand to the forward slash.
>
> However, you wanted to escape the forward slash in the regular expression.
> To do that, you would have to write
>
>         "\\/"
>
> i.e. specifying, in a C string, a backslash character, followed by a
> forward slash character.
>
> However, the regular expression would then be interpreting the backslash
> character as escape character in its own right, seeing the forward slash,
> and replacing this sequence by a forward slash.
>
> But it does not need to be escaped, when you specify the regular
> expression the way we do. And the way we specified it is really the
> standard when specifying regular expressions in C code, i.e. *without* the
> suggested backslash.

Aha!. this makes total sense. I was thinking from a general regular expression
point of view. But I should be thinking from C point of view and how C
might interpret this newly submitted string.
This explanation is very clear. Thanks for taking time to reply to my
patch. From next time on, I will try to think from
git project's point of view.

Thanks,
Ananya.

> Ciao,
> Johannes
>
> >
> > Thanks,
> > Ananya.
> >
> > > Thanks,
> > > Johannes
> > >
> > > > ---
> > > >  userdiff.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/userdiff.c b/userdiff.c
> > > > index f565f6731..f4ff9b9e5 100644
> > > > --- a/userdiff.c
> > > > +++ b/userdiff.c
> > > > @@ -123,7 +123,7 @@ PATTERNS("python", "^[ \t]*((class|def)[ \t].*)$",
> > > >        /* -- */
> > > >        "[a-zA-Z_][a-zA-Z0-9_]*"
> > > >        "|[-+0-9.e]+[jJlL]?|0[xX]?[0-9a-fA-F]+[lL]?"
> > > > -      "|[-+*/<>%&^|=!]=|//=?|<<=?|>>=?|\\*\\*=?"),
> > > > +      "|[-+*\/<>%&^|=!]=|\/\/=?|<<=?|>>=?|\\*\\*=?"),
> > > >        /* -- */
> > > >  PATTERNS("ruby", "^[ \t]*((class|module|def)[ \t].*)$",
> > > >        /* -- */
> > > > --
> > > > 2.17.1
> > > >
> > > >
> >