[PATCH] repository.c: always allocate 'index' at repo init time

There are two ways a 'struct repository' could be initialized before
using: via initialize_the_repository() and repo_init().

The first way always initializes 'index' field because that's how it is
before the introduction of 'struct repository'. Back then 'the_index' is
always available (even if not loaded). The second way however leaves
'index' NULL and relies on repo_read_index() to allocate it on demand.

The problem with the second way is that, the majority of our code base
was written with 'the_index' (i.e. the first way) in mind, where
dereferencing 'the_index' (or the 'index' field now) is always

The second way breaks this assumption. The 'index' field can be NULL
until loading from disk, which could lead to segfaults like
581d2fd9f2 (get_oid: handle NULL repo->index, 2019-05-14).

We have two options to handle this: either we audit the entire code
base, adding 'is index NULL' when needed, or we make sure 'index' is
never NULL to begin with.

This patch goes with the second option, making sure that 'index' is
always allocated after initialization. It's less effort than the first
one, and also safer because you could still miss things during the code
audit. The extra allocation cost is not a real concern.

The 'index' field is still freed and reset to NULL in repo_clear(). But
after that call, a lot more is missing in 'repo' and it can never be
used again without going through reinitialization phase. So it should be

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx>
 repository.c | 3 ++-
 repository.h | 4 ++++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/repository.c b/repository.c
index 682c239fe3..ca58692504 100644
--- a/repository.c
+++ b/repository.c
@@ -160,6 +160,7 @@ int repo_init(struct repository *repo,
 	struct repository_format format = REPOSITORY_FORMAT_INIT;
 	memset(repo, 0, sizeof(*repo));
+	repo->index = xcalloc(1, sizeof(*repo->index));
 	repo->objects = raw_object_store_new();
 	repo->parsed_objects = parsed_object_pool_new();
@@ -262,7 +263,7 @@ void repo_clear(struct repository *repo)
 int repo_read_index(struct repository *repo)
 	if (!repo->index)
-		repo->index = xcalloc(1, sizeof(*repo->index));
+		BUG("the repo hasn't been setup");
 	return read_index_from(repo->index, repo->index_file, repo->gitdir);
diff --git a/repository.h b/repository.h
index 4fb6a5885f..75c4f68b22 100644
--- a/repository.h
+++ b/repository.h
@@ -85,6 +85,7 @@ struct repository {
 	 * Repository's in-memory index.
+	 * Cannot be NULL after initialization.
 	 * 'repo_read_index()' can be used to populate 'index'.
 	struct index_state *index;
@@ -132,6 +133,9 @@ struct submodule;
 int repo_submodule_init(struct repository *subrepo,
 			struct repository *superproject,
 			const struct submodule *sub);
+ * Release all resources in 'repo'. 'repo' cannot be used again.
+ */
 void repo_clear(struct repository *repo);