Web lists-archives.com

Re: New lintian warning: vcs-deprecated-in-debian-infrastructure

Am 23.03.2018 um 05:27 schrieb Guillem Jover:
> On Thu, 2018-03-22 at 23:23:22 +0100, Markus Koschany wrote:
>> Am 22.03.2018 um 15:40 schrieb Guillem Jover:
>>> On Thu, 2018-03-22 at 12:47:44 +0200, Lars Wirzenius wrote:
>>>> On Thu, 2018-03-22 at 09:58 +0100, Andreas Tille wrote:
>> You need online access to make use of the above information in any way.
>> If you want to contact the maintainer you need internet access, if you
>> want to visit the upstream homepage you need internet access, etc.
> Not really. For starters, we'd need to have network access to be able
> to run dch or lintian, because they do check whether an upload is
> supposed to be an NMU, team upload, etc. This seems ridiculous.

It is not that difficult to pass dch --team, dch --nmu or gbp dch manually.

If you were involved in team maintenance you would know that the
Uploaders field is often completely outdated. The only way you can see
who maintains a package is by looking at the Git history or upload
history in tracker.debian.org. We had/have contributors who were
mentioned as Uploaders in hundreds of packages and now they only can be
removed by uploading a new package. A web based solution with a database
would solve this problem in seconds. NMU or Team uploads are
interchangeable terms and not very interesting when all you want to know
is: Who did the last upload?

> Some people like to add branch information in the Vcs- fields, this
> would then require to support adding that info before the package
> exists in tha vcs or archive branch (which means less sanity checks)
> or afterwards (which means it's prone to be forgotten).

I don't mind if you have special requirements but I don't see a reason
why this should get in the way of the majority of contributors who can
live with the standards. My point is that the default should be
different. Don't require contributors to maintain information which can
be maintained more efficiently in tracker.debian.org. Prefer convention
over configuration.

> The Section and Priority are overridden by ftp-masters, but the
> maintainer tends to fill it more or less correctly. Without them,
> ftp-masters would need to come up with values from scratch, which
> is more difficult than correcting them if they seem off.

Almost all packages are priority optional. That could simply be the
default. Our essential, required or standard packages change very
rarely. If you have to introduce a new package with a higher priority as
optional it has to go through NEW as every other package. While this
processing takes place we could find a way to ensure that the priority
is correct.

Removing Section and Priority from debian/control would not be an urgent
priority for me because they don't change frequently like
Vcs/homepage/maintainer but if we think outside the box it could be done.

> If one has got to update the data for several of those fields, it can
> currently be done off-line, and then pushed when one's back on-line,
> this is getting back into the centralized development model.

I don't see how this outweighs the benefit of adding and changing
information _once_ when you are back online. Instead of preparing
hundreds of uploads, compiling the packages on our infrastructure and
duplicating the same information on hundreds of mirrors around the world
(which wastes time, energy and disk space) you maintain your _meta
information_ in a web application. Most, especially younger people, are
familiar with this concept and I would expect this would seem more
natural to them and probably even attract more contributors.

That doesn't stop you from doing the real development work offline.
Besides a web based solution like tracker would provide the unique
opportunity to get more people without upload permissions involved.
Imagine a contributor that corrects your outdated homepage entry and
upstream Git repository while you are offline and as soon as you are
back online the only thing you have to do is to approve it.

> These fields/files are generally useful outside Debian, if we stop
> providing them then it's to be expected that tooling might bit-rot.
> While requiring someone with a tiny local archive to install a tracker
> instance seems just onerous.

There is no need for a local tracker instance. An offline feature is not
important as long as you are online once in a while which is to be
expected nowadays. The meta information just moves to a new central
place. Yes, you have to adapt some tools but you gain all the benefits
of a REST API with a unique interface and data in JSON/XML/YAML/HTML
whatever. It will be easier to access for random people and I can
imagine it would simplify the information exchange between different
distributions not just Debian derivatives.

>> These
>> information are distribution-independent because they are either the
>> same like "Homepage" or you could simply look them up if you define a
>> common and central platform like tracker.debian.org as your main hub for
>> external/additional/random information about package X.
> This still means keeping the package and its metadata separate, which
> is prone to be forgotten by maintainers when updating packaging locally.
> So it's a locality problem too. Look at the current debtags coverage. :(

Well, you can adjust Lintian in a way to check the web service for the
information which was formerly present in debian/control and get all the
warnings and errors people like. One of the problems with debtags is
that they are just another way to categorize information which you can
obtain by other means and they were never a real requirement by Policy.
I am optimistic this would change in the future if we facilitate random
contributions and a centralized web service is ideal for this scenario
in my opinion.

>> Look how Fedora
>> have solved the same issue. They have https://src.fedoraproject.org/ and
>> everything is organized by convention in an identical way. There is no
>> need for them to put the same information into their spec files and they
>> also have derivatives. Be sure Ubuntu or Mint users don't care about our
>> Vcs or maintainer fields as well.
> Well, Fedora is not Debian, we have wildly different history, practices
> and workflows for a reason, etc. And, I'd say what you provide is actually
> a counter-example. They list the Group (our Section), Url (our Homepage)
> and the SourceN (our watch file) in the spec file. And they do not need
> (ahem) Vcs fields because they have a unified and properly namespaced (!)
> git managed set of packaging-only repos. And when it comes to the
> contributors, well in many cases it does not even reflect reality between
> what's listed on the site and who is doing the changes, so there you go.

Exactly they use properly namespaced! Git repositories. That's what I
meant by using conventions and you know who maintains a package by
looking at the recent Git commits. This is certainly something we could
do as well despite having a different "history". There is still room for
improvement (homepage, watch file or section) but here we have the
opportunity of being the first ones! I could imagine that we organize
all Git repositories via their Sections like


If Gitlab was more flexible we could create virtual ownerships for teams
or individuals then but the basic structure and layout would be
self-explanatory. This is just an idea, my point is you can separate
metadata from data which is needed to build a package and you can even
derive information by using conventions.

>>>> All of the above can change and it's silly to have to make a source
>>>> upload to change them. They also easily get out of date and so are
>>>> likely out of date for a stable release.
>>> Yes, it might be silly to have to upload a package just and only to
>>> update that information, or having that data being permanently
>>> out-of-date on stable. But this problem can be easily solved already,
>>> the archive, and most (if not all!?) repo managers have had the
>>> concept of overrides for a very long time, starting with things like
>>> dpkg-scanpackages/dpkg-scansources!
>> I don't understand why some people always think it is smart to create or
>> use some new program or tool to solve a problem when the most efficient
>> way would be to solve a problem non-programmatically. Reduce the
>> complexity by defining conventions which all developers have to follow
>> in Debian like maintaining external information in tracker.debian.org
>> instead of the source package.
> That depends on the context. In the Debian context, people like to do
> and work in very different ways, that's why we have all these many
> workflows, helpers and vcs used (it's both a curse and a blessing).
> Would you be happy if the project forced upon you a workflow, helper
> and vcs you cannot stand f.ex.? Imagine trying to switch in the future
> when something new and interesting pops up.

Yes, I am able and willing to adapt to a specific workflow as long as
everyone else pulls together. If the project decided to move to
Subversion as the preferred Vcs, so be it. Then we develop tools to
facilitate working with it and concentrate on making it a success.
Debian's diversity has its benefits and there is a reason why I mostly
contribute to Debian but I don't need absolute choice when it comes to
Debian packaging. If 90 % of all developers prefer Git, then we settle
for that, simple. There are always people who dislike something and
there is no way to please everyone. Nevertheless I expect that the
minority goes with the flow as soon as a decision has been made. They
can always choose to use different tools but they shouldn't expect that
the majority is obliged to support that.

> Also, just backtracking a bit, this subthread was triggered by the Vcs
> URLs needing updates due to the salsa switch. Even with a solution
> based entirely on tracker.d.o, you could not do any mass conversion
> there, because salsa does not follow any namespace convention at all.
>> A switch of Vcs platform would become a
>> trivial matter in the future by replacing some strings on the server. It
>> would take seconds and could be solved by a single person.
> The current switch should be proof that this cannot be done right now
> anyway.

Yes, first we have to agree on that our current model is insufficient
and inflexible. I participate in this thread because I converted SVN
repos to git://git.debian.org, then to git://anonscm.debian.org, then to
https://anonscm.debian.org and now to https://salsa.debian.org. That's nuts.

>> I have already seen hundreds of uploads just for the sake of changing the
>> Vcs-fields in debian/control. That is crazy.
> Well sure. But what also seems crazy to me is that we are discussing some
> kind of centralized storage for these URLs (because they change too often)
> that cannot be automatically migrated anyway, because there's no global 1:1
> mapping. Instead of, I don't know, considering that if we had proper stable
> URLs preserved, we'd not need to do any update at all? :)

Others said it before but I thought anonscm would be the final URL
forever. We really should do better here. Either we move the URL out of
debian/control, so that it can be changed more quickly and conveniently
or we decide on a proper Git namespace (see my comments above). Anyway
the current situation is undeniably a design flaw.

>>>> I think this would be a better thing to spend time on than talking
>>>> again about keeping anonscm around.
>>> And this is still missing the point, as I've also said in the past. The
>>> worst part of this is not IMO to have to update the Vcs fields, which
>>> TBH is one more time too many. It's that it implies any downstream, any
>>> service pulling from the repo, any mirror and checkout, needs to be
>>> noticed in time (while the redirect is in place, because there's now
>>> recommendation to remove it after the next upload!) and then someone or
>>> something needs to update all those references lying around. :(
>> If you store this information on tracker.debian.org there would be no
>> delay and no inconvenience for downstreams at all. In most circumstances
>> they have their own "tracker.debian.org" service like launchpad.net.
>> Things like the maintainer or Vcs field get overridden anyway and other
>> information could be provided via a simple API and thus kept in sync
>> with their services.
> I'm not sure what this has to do with what I said above. Who or what is
> going to fix the URLs in remotes of all the existing chekouts, or things
> like say openhub.net, mirrors in github.com (or other hosting) sites,
> etc., etc. How does having a centralized storage or an API help here at
> all?

Nobody is going to fix all remote checkouts in existence automatically.
Whenever someone decides to change the URL you have to change it as
well. My point is that distributions use their own version control
systems and services which makes them independent from us. A centralized
API that provides such meta information is more accessible and provides
more options to retrieve data in different forms and simplifies
information exchange. This makes it superior to the status quo.

Attachment: signature.asc
Description: OpenPGP digital signature