Web lists-archives.com

Re: [PATCH] userdiff: two simplifications of patterns for rust

Am 30.05.19 um 20:59 schrieb Ævar Arnfjörð Bjarmason:
> On Thu, May 30 2019, Johannes Sixt wrote:
>> - Do not enforce (but assume) syntactic correctness of language
>>   constructs that go into hunk headers: we only want to ensure that
>>   the keywords actually are words and not just the initial part of
>>   some identifier.
>> - In the word regex, match numbers only when they begin with a digit,
>>   but then be liberal in what follows, assuming that the text that is
>>   matched is syntactially correct.
> I don't know if this is possible for Rust (but very much suspect so...),
> but I think that in general we should aim to be more forgiving than not
> with these patterns.

The C/C++ pattern is actually very forgiving in the hunk header pattern:
It takes every line that begins with an un-indented letter. That works
very well in in C because C does not have nested functions and it is
typical that the function definition lines are not indented. But that
breaks down with C++: indented function definitions are very common;
they happen inside class and namespace definitions. Such functions are
not picked up, and we live with that so far (at least, I do).

> Because, as the history of userdiff.c shows, new keywords get introduced
> into these languages, and old git versions survive for a long time. If
> the syntax is otherwise fairly regular perhaps we don't need to hardcode
> the list of existing keywords?

We are talking about (1) hunk header lines (not something really
important) and (2) programming languages: new keywords don't pop up
every month. Granted, inventing new languages is en vogue these days.
But really, I mean, WTH?

Having available keywords to recognize hunk header candidates helps a
lot. I thought long about a possible pattern for C++, but I gave up,
because the language is so rich and there are no suitable keywords.

-- Hannes