Web lists-archives.com

Re: [PATCH] coccicheck: process every source file at once

On Tue, Oct 02, 2018 at 03:55:19PM -0400, Jeff King wrote:
> On Tue, Oct 02, 2018 at 12:16:42PM -0700, Jacob Keller wrote:
> > make coccicheck is used in order to apply coccinelle semantic patches,
> > and see if any of the transformations found within contrib/coccinelle/
> > can be applied to the current code base.
> > 
> > Pass every file to a single invocation of spatch, instead of running
> > spatch once per source file.
> > 
> > This reduces the time required to run make coccicheck by a significant
> > amount of time:
> > 
> > Prior timing of make coccicheck
> >   real    6m14.090s
> >   user    25m2.606s
> >   sys     1m22.919s
> > 
> > New timing of make coccicheck
> >   real    1m36.580s
> >   user    7m55.933s
> >   sys     0m18.219s
> Yay! This is a nice result.
> It's also one of the things that Julia suggested in an earlier thread.
> One thing I wasn't quite sure about after digging into the various
> versions (1.0.4 on Debian stable/unstable, 1.0.6 in experimental, and
> pre-1.0.7 at the bleeding edge) was whether the old versions would be
> similarly helped (or work at all).
> I just replicated your results with 1.0.4.deb-3+b2 from Debian stable.
> It's possible there are older versions floating around, but for
> something developer-only like this, I think "in Debian stable" is a
> reasonable enough cutoff.

Linux build jobs on Travis CI run Ubuntu Trusty 14.04 LTS, and
therefore our static analysis build job still runs 1.0.0~rc19.deb-3.

This patch appears to work fine with that version, too, though note
that I also changed the build job to don't run two parallel jobs; for
the reason see below.

> > This is nearly a 4x decrease in the time required to run make
> > coccicheck. This is due to the overhead of restarting spatch for every
> > file. By processing all files at once, we can amortize this startup cost
> > across the total number of files, rather than paying it once per file.
> One minor side effect is that we lose the opportunity to parallelize
> quite as much. However, I think the reduction in total CPU makes it well
> worth that (and we still have 8 cocci files and growing, which should
> keep at least 8 cores busy).

One major side effect, however, is the drastically increased memory
usage resulting from processing all files at once.  With your patch on
top of current master:

  $ for cocci in contrib/coccinelle/*.cocci ; do command time -f 'max RSS: %MkB' make ${cocci}.patch ; done
       SPATCH contrib/coccinelle/array.cocci
  max RSS: 1537068kB
       SPATCH contrib/coccinelle/commit.cocci
  max RSS: 1510920kB
       SPATCH contrib/coccinelle/free.cocci
  max RSS: 1393712kB
       SPATCH contrib/coccinelle/object_id.cocci
       SPATCH result: contrib/coccinelle/object_id.cocci.patch
  max RSS: 1831700kB
       SPATCH contrib/coccinelle/qsort.cocci
  max RSS: 1384960kB
       SPATCH contrib/coccinelle/strbuf.cocci
  max RSS: 1395800kB
       SPATCH contrib/coccinelle/swap.cocci
  max RSS: 1393620kB
       SPATCH contrib/coccinelle/xstrdup_or_null.cocci
  max RSS: 1371716kB

Without your patch the max RSS lingers around 87048kB - 101980kB,
meaning ~15x - 18x increase

This might cause quite the surprise to developers who are used to run
'make -jN coccicheck'; my tiny laptop came to a grinding halt with

I think this side effect should be mentioned in the commit message.

Furthermore, the above mentioned static analysis build job on Travis
CI runs two parallel jobs.  Perhaps we should be considerate and
reduce that to a single job, even if that means that he build job will
run longer.

> I think recent versions of Coccinelle will actually parallelize
> internally, too, but my 1.0.4 doesn't seem to. That's probably what I
> was thinking of earlier (but this is still a win even without that).

I think Coccinelle parallelizes only when invoked with -j <N>, but
that option is not documented in 1.0.4.  I wrote "documented" instead
of "present", because e.g.:

  $ spatch --sp-file contrib/coccinelle/swap.cocci -j 2 a*.c

doesn't abort with "unknown option" and usage, but runs for a bit and
then outputs:

  init_defs_builtins: /usr/lib/coccinelle/standard.h
  HANDLING: abspath.c advice.c alias.c alloc.c apply.c archive.c archive-tar.c archive-zip.c argv-array.c attr.c
  Fatal error: exception Sys_error("swap: No such file or directory")

Without '-j 2' the command runs just fine.  I don't know what to make
of it.

> It looks like this also fixes a problem I ran into when doing the oideq
> work, which is that the patch for a header file would get shown multiple
> times (once for each file that includes it). So this is faster _and_
> more correct. Double-yay.

In that case I used to apply the change to the header first, and then
apply the semantic patch one more time.