Web lists-archives.com

Re: [PATCH 2/2] packfile: refactor hash search with fanout table

On Fri, 9 Feb 2018 19:03:48 +0100
René Scharfe <l.s.r@xxxxxx> wrote:

> Going from unsigned to signed int means the patch breaks support for
> more than 2G pack entries, which was put with 326bf39677 (Use uint32_t
> for all packed object counts.) in 2007.

Ah, good catch. I'll wait to see if there are any more comments, then
send out a new version.

> > +int bsearch_hash(const unsigned char *sha1, const void *fanout_,
> > +		 const void *table_, size_t stride)
> > +{
> > +	const uint32_t *fanout = fanout_;
> Why hide the type?  It doesn't make the function more generic.

I thought that the fanout_ parameter could come from a variety of
sources (e.g. direct mmap - void *, or mmap with some pointer arithmetic
- char *) so I just picked the generic one. But now I realize that that
could lead to unaligned reads, which is probably not a good idea. I'll
update it.

For consistency, I'll also update table_ to be unsigned char *.
(Unsigned because it is primarily interpreted as hashes, which use
"unsigned char *" in the Git code.)

> Why not use sha1_pos()?  I guess because it avoids the overhead of the
> accessor function, right?  And I wonder how much of difference it makes.

Yes, overhead of the accessor function. We would also need to modify
sha1_pos to take in a function that we can pass userdata to (to contain
the stride).

> A binary search function for embedded hashes just needs the key, a
> pointer to the first hash in the array, the stride and the number of
> elements.  It can then be used with or without a fanout table, making it
> more versatile.  Just a thought.

I specifically want to include the fanout table in the calculation here,
because it will be used by subsequent patches that also incorporate the
fanout table.