Web lists-archives.com

Re: [PATCH v3 1/3] refs/files-backend: add longer-scoped copy of string to list

On Tue, Aug 29, 2017 at 07:18:22PM +0200, Martin Ågren wrote:

> Observe that split_symref_update() creates a `new_update`-object through
> ref_transaction_add_update(), after which `new_update->refname` is a
> copy of `referent`. The difference is, this copy will be freed, and it
> will be freed *after* `affected_refnames` has been cleared.
> Rearrange the handling of `referent`, so that we don't add it directly
> to `affected_refnames`. Instead, first just check whether `referent`
> exists in the string list, and later add `new_update->refname`.

Coincidentally[1] I came across this same leak, and my solution ended up
slightly different. I'll share it here in case it's of interest.

In your solution we end up searching the string list twice: once to see
if we have the item, and then again to insert it. Whereas in the
original we did both with a single search.

But we can observe that either:

  1. The item already existed, in which case our insert was a noop, and
     we're good.


  2. We inserted it, in which case we proceed with creating new_update.

     We can then in O(1) replace the pointer in the string list item
     with the storage in new_update. We know we're not violating any
     string_list invariants because the strings contain the same bytes.


diff --git a/refs/files-backend.c b/refs/files-backend.c
index 9266f5ab9d..1d16c1b33e 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2178,6 +2178,11 @@ static int split_symref_update(struct files_ref_store *refs,
 	update->flags |= REF_LOG_ONLY | REF_NODEREF;
 	update->flags &= ~REF_HAVE_OLD;
+	/*
+	 * Re-point at the storage provided by our ref_update, which we know
+	 * will last as long as the affected_refnames list.
+	 */
+	item->string = new_update->refname;
 	item->util = new_update;
 	return 0;

It feels pretty dirty, though. It would certainly be a bug if we ever
decided to switch affected_refnames to duplicate its strings.

So given that your solution is only a constant-time factor worse in
efficiency, we should probably prefer it as the more maintainable


[1] It's not really a coincidence, of course. All the recent leak
    discussion has got both of us prodding at Git with various tools. :)