Web lists-archives.com

Re: Questions on GSoC 2019 Ideas

On Sat, Mar 2, 2019 at 4:09 PM Thomas Gummerer <t.gummerer@xxxxxxxxx> wrote:
> On 03/01, Duy Nguyen wrote:
> > On Fri, Mar 1, 2019 at 5:20 AM Christian Couder
> > <christian.couder@xxxxxxxxx> wrote:
> > >
> > > Hi Matheus,
> > >
> > > On Thu, Feb 28, 2019 at 10:46 PM Matheus Tavares Bernardino
> > > <matheus.bernardino@xxxxxx> wrote:
> > > >
> > > > I've been in the mailing list for a couple weeks now, mainly working
> > > > on my gsoc micro-project[1] and in other patches that derived from it.
> > > > I also have been contributing to the Linux Kernel for half an year,
> > > > but am now mainly just supporting other students here at USP.
> > > >
> > > > I have read the ideas page for the GSoC 2019 and many of them interest
> > > > me. Also, looking around git-dev materials on the web, I got to the
> > > > GSoC 2012 ideas page. And this one got my attention:
> > > > https://github.com/peff/git/wiki/SoC-2012-Ideas#improving-parallelism-in-various-commands
> > > >
> > > > I'm interested in parallel computing and that has been my research
> > > > topic for about an year now. So I would like to ask what's the status
> > > > of this GSoC idea. I've read git-grep and saw that it is already
> > > > parallel, but I was wondering if there is any other section in git in
> > > > which it was already considered to bring parallelism, seeking to
> > > > achieve greater performance. And also, if this could, perhaps, be a
> > > > GSoC project.
> > >
> > > I vaguely remember that we thought at one point that all the low
> > > hanging fruits had already been taken in this area but I might be
> > > wrong.
> >
> > We still have to remove some global variables, which is quite easy to
> > do, before one could actually add mutexes and stuff to allow multiple
> > pack access. I don't know though if the removing global variables is
> > that exciting for GSoC, or if both tasks could fit in one GSoC. The
> > adding parallel access is not that hard, I think, once you know
> > packfile.c and sha1-file.c relatively well. It's mostly dealing with
> > caches and all the sliding access windows safely.
> I'm not very familiar with what's required here, but reading the above
> makes me think it's likely too much for a GSoC project.  I think I'd
> be happy with a project that declares removing the global variables as
> the main goal, and adding parallelism as a potential bonus.

Yeah, I think that the main issue, now that Duy found something that
could be a GSoC project, is that the potential mentors are not
familiar with the pack access code. It means that Matheus would
probably not get a lot of help from his mentors when he would work on
adding parallelism.

That may not be too big a problem though if Matheus is ok to ask many
technical questions on the mailing list. It seems to me that he could

> I'm a bit wary of a too large proposal here, as we've historically
> overestimated what kind of project is achievable over a summer (I've
> been there myself, as my GSoC project was also more than I was able to
> do in a summer :)).  I'd rather have a project whose goal is rather
> small and can be expanded later, than having something that could
> potentially take more than 3 months, where the student (or their
> mentors) have to finish it after GSoC.

Yeah, I agree with your suggestion about a project that declares
removing the global variables as the main goal, and adding parallelism
as a potential bonus.

One thing I am still worried about is if we are sure that adding
parallelism is likely to get us a significant performance improvement
or not. If the performance of this code is bounded by disk or memory
access, then adding parallelism might not bring any benefit. (It could
perhaps decrease performance if memory locality gets worse.) So I'd
like some confirmation either by running some tests or by experienced
Git developers that it is likely to be a win.