Web lists-archives.com

Re: [PATCH 27/34] shortlog: release strbuf after use in insert_one_record()

On Fri, Sep 08, 2017 at 09:33:38AM +0900, Junio C Hamano wrote:

> >> An alterative, as this is the only place we add to log->list, could
> >> be to make log->list take ownership of the string by not adding a
> >> _release() here but instead _detach(), I guess.
> >
> > I agree that would be better, but I think it's a little tricky. The
> > string_list_insert() call may make a new entry, or it may return an
> > existing one. We'd still need to free in the latter case. I'm not sure
> > the string_list interface makes it easy to tell the difference.
> True; I do not think string_list API does.  But for this particular
> application, I suspect that we can by looking at the util field of
> the item returned.  A newly created one has NULL, but we always make
> it non-NULL before leaving this function.

Yeah, I agree that would work here.

I also wondered if we could get away with avoiding the malloc entirely
here. Especially in the "shortlog -n" case, it is identical to the name
field we already have in ident.name. So ideally we'd do a lookup to see
if we have the entry before allocating anything (since we do one lookup
per commit, but only insert once per unique author).

But that doesn't quite work, because ident.name doesn't put to a
NUL-terminated string, and string_list only handles strings.

We _can_ reuse the same buffer over and over:

diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 43c4799ea9..7328abf4a1 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -54,7 +54,7 @@ static void insert_one_record(struct shortlog *log,
 	struct string_list_item *item;
 	const char *mailbuf, *namebuf;
 	size_t namelen, maillen;
-	struct strbuf namemailbuf = STRBUF_INIT;
+	static struct strbuf namemailbuf = STRBUF_INIT;
 	struct ident_split ident;
 	if (split_ident_line(&ident, author, strlen(author)))
@@ -66,6 +66,7 @@ static void insert_one_record(struct shortlog *log,
 	maillen = ident.mail_end - ident.mail_begin;
 	map_user(&log->mailmap, &mailbuf, &maillen, &namebuf, &namelen);
+	strbuf_reset(&namemailbuf);
 	strbuf_add(&namemailbuf, namebuf, namelen);
 	if (log->email)

That saves the malloc, if not the extra copying. It shows about a 1%
speed up on "git shortlog -ns" on linux.git. Probably that's not worth
caring too much about, but it also "solves" the leaking problem (I'm not
sure if the speedup is from calling malloc less frequently, or from
lowering our peak heap usage due to fixing the leak).