Web lists-archives.com

[PATCH 0/4] Bloom filter experiment




To keep the ball rolling, here is my proof of concept in a somewhat
cleaned-up form, with still plenty of rough edges.

You can play around with it like this:

  $ GIT_USE_POC_BLOOM_FILTER=$((8*1024*1024*8)) git commit-graph write
  Computing commit graph generation numbers: 100% (52801/52801), done.
  Computing bloom filter: 100% (52801/52801), done.
  # Yeah, I even added progress indicator! :)
  $ GIT_TRACE_BLOOM_FILTER=2 GIT_USE_POC_BLOOM_FILTER=y git rev-list --count --full-history HEAD -- t/valgrind/valgrind.sh
  886
  20:40:24.783699 revision.c:486          bloom filter total queries: 66095 definitely not: 64953 maybe: 1142 false positives: 256 fp ratio: 0.003873

The value of $GIT_USE_POC_BLOOM_FILTER only really matters when writing
the Bloom filter, and it specifies the number of bits in the filter's
bitmap, IOW the above command creates a 8MB Bloom filter.  To make use
of the filter the variable can be anything non-empty.

Writing the Bloom filter is very slow as it is (yeah, that's why
bothered with the progress indicator ;).  I wrote about it in patch 2's
commit message: the cause for about half of the slowness is rather
obvious, but I don't (yet) know what's responsible for the other half.


Not a single test...  but I've run loops over all files in git.git
comparing 'git rev-list HEAD -- $file's output with and without the
Bloom filter, and, surprisingly, they match.  My quick'n'dirty
experiments usually don't fare this well...


It's also available at:

  https://github.com/szeder/git bloom-filter-experiment


SZEDER Gábor (4):
  Add a (very) barebones Bloom filter implementation
  commit-graph: write a Bloom filter containing changed paths for each
    commit
  revision.c: use the Bloom filter to speed up path-limited revision
    walks
  revision.c: add GIT_TRACE_BLOOM_FILTER for a bit of statistics

 Makefile       |   1 +
 bloom-filter.c | 103 +++++++++++++++++++++++++++++++++++++++
 bloom-filter.h |  39 +++++++++++++++
 commit-graph.c | 116 ++++++++++++++++++++++++++++++++++++++++++++
 pathspec.h     |   1 +
 revision.c     | 129 +++++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 389 insertions(+)
 create mode 100644 bloom-filter.c
 create mode 100644 bloom-filter.h

-- 
2.19.1.409.g0a0ee5eb6b