Web lists-archives.com

Re: [PATCH] sha1_file: use strbuf_add() instead of strbuf_addf()

On Fri, Dec 01, 2017 at 02:50:05PM -0500, Derrick Stolee wrote:

> > > +	baselen = path->len;
> > We set this here so that the '/' is included as part of the base. Makes
> > sense, but can we now drop the earlier setting of baselen before the
> > opendir() call?
> Yeah, probably. I had briefly considered just adding the '/' before the
> first assignment of "baselen", but didn't want to change the error output. I
> also don't know if there are side effects for other platforms by calling
> opendir() with a '/'-terminated path.

I noticed that, too. Since it's so easy to keep doing the opendir
without the slash, I'd prefer to avoid finding out if there are such
platforms. :)

> Good catch! A big reason to pull it inside and use strbuf_add over
> strbuf_addstr is to avoid a duplicate strlen() calculation. However, I can
> store the length before the conditional.

I'd give 50/50 odds no whether a compiler could optimize out that
strlen. We inline addstr exactly so that callsites can see that strlen
(it's primarily for string literals, where it can become a compile-time
constant, but I think it could apply here). But sometimes C's pointer
aliasing rules can be surprising in blocking "obviously correct"
optimizations like that.

The generated asm is a little dense, but I _think_ "gcc -O2" does in
fact do this with a single strlen based on the following tweak on top of
your patch:

diff --git a/sha1_file.c b/sha1_file.c
index 2160323c4a..f234519744 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1921,11 +1921,12 @@ int for_each_file_in_obj_subdir(unsigned int subdir_nr,
 		if (is_dot_or_dotdot(de->d_name))
+		strbuf_setlen(path, baselen);
+		strbuf_addstr(path, de->d_name);
 		if (strlen(de->d_name) == GIT_SHA1_HEXSZ - 2 &&
 		    !hex_to_bytes(oid.hash + 1, de->d_name,
 				  GIT_SHA1_RAWSZ - 1)) {
-			strbuf_setlen(path, baselen);
-			strbuf_add(path, de->d_name, GIT_SHA1_HEXSZ - 2);
 			if (obj_cb) {
 				r = obj_cb(&oid, path->buf, data);
 				if (r)

Not that I overly mind the manual assignment of the strlen result in
this particular case. But I'm a curious fellow by nature, and knowing
these kinds of answers helps us build up an accurate gut instinct for
future cases.

> Small change by storing the length in advance of the conditional:
> while (de = readdir(...)) {
>     int namelen = strlen(de->d_name);
>     strbuf_setlen(path, baselen);
>     strbuf_add(path, de->d_name, namelen);
>     if (namelen == HEXSZ - 2)
>         obj_cb(path->buf)
>     else
>         cruft_cb(path->buf);
> }

Yup, I don't mind that approach either, but do please use size_t to
store the result of strlen (I know it's nearly impossible to overflow in
this case, but I've been trying to push the codebase in that direction
slowly over time).

> >    - there's an extra micro-optimization there, which is that if there's
> >      no obj_cb, we have no need to assemble the full path at all. I doubt
> >      it makes much of a difference, as most callers would pass an object
> >      callback (I'd be surprised if we even have one that doesn't).
> After doing a few 'git grep' commands, I found several that include a NULL
> cruft_cb but none that have a NULL obj_cb.

Yeah, that agrees with my cursory look.