Web lists-archives.com

[PATCH v1 0/4] Speed up index load through parallelization




This patch will help address the CPU cost of loading the index by adding
a table of contents extension to the index that will allow us to
multi-thread the loading and conversion of cache entries from the on-disk
format, to the in-memory format.  This is particularly beneficial
with large indexes and V4 indexes which have more CPU cost due to the
prefix-encoding.

I wanted to get feedback on the concept as the way I'm adding the table
of contents information via an extension that can be read before the
variable length section of cache entries and other extensions is a bit
of a clever hack (see below) as is the resetting of the prefix encoding
for V4 indexes. Both, however, are entirely backwards compatible with
older versions of git which can still properly read and use the index.

I'm not particularly fond of the names "fastindex" and "IEOT." I've
wondered if "indextoc" and "Index Table of Contents (ITOC)" would be
better names but I'm open to suggestions.

As there is overhead to spinning up a thread, there is logic to only
do the index loading in parallel when there are enough entries for it
to help (currently set at 7,500 per thread with a minimum of 2 threads).

The impact of the change can be seen using t/helper/test-read-cache:

                                fastindex
test            count   files   TRUE    FALSE     Savings
------------------------------------------------------------------------
test-read-cache 500     100K    6.39	8.33      23.36%
test-read-cache 100     1M      12.49   18.68     33.12%

The on-disk format looks like this:

Index header
Cache entry 1
Cache entry 2
.
.
.
Extension 1
Extension 2
.
.
Index Entry Offset Table Extension (must be written last!)
IEOT signature bytes
32-bit size
32-bit version
32-bit Cache Entry Offset 1
32-bit Cache Entry count
32-bit Cache Entry Offset 2
32-bit Cache Entry count
.
.
.
32-bit version
32-bit size
IEOT signature bytes
SHA1

Signed-off-by: Ben Peart <benpeart@xxxxxxxxxxxxx>

Base Ref: master
Web-Diff: https://github.com/benpeart/git/commit/1146d38932
Checkout: git fetch https://github.com/benpeart/git fastindex-v1 && git checkout 1146d38932

Ben Peart (4):
  fastindex: speed up index load through parallelization
  update-index: add fastindex support to update-index
  fastindex: add test tools and a test script
  fastindex: add documentation for the fastindex extension

 Documentation/config.txt                 |   8 +
 Documentation/git-update-index.txt       |  11 +
 Documentation/technical/index-format.txt |  26 +++
 Makefile                                 |   2 +
 builtin/update-index.c                   |  22 ++
 cache.h                                  |  25 +++
 config.c                                 |  20 ++
 config.h                                 |   1 +
 environment.c                            |   3 +
 read-cache.c                             | 343 +++++++++++++++++++++++++++++--
 t/helper/test-dump-fast-index.c          |  68 ++++++
 t/helper/test-fast-index.c               |  84 ++++++++
 t/t1800-fast-index.sh                    |  55 +++++
 13 files changed, 647 insertions(+), 21 deletions(-)
 create mode 100644 t/helper/test-dump-fast-index.c
 create mode 100644 t/helper/test-fast-index.c
 create mode 100644 t/t1800-fast-index.sh


base-commit: 7668cbc60578f99a4c048f8f8f38787930b8147b
-- 
2.15.0.windows.1