Web lists-archives.com

Re: [RFC/WIP PATCH] object store classification

On 07/10, Stefan Beller wrote:
> On Fri, Jul 7, 2017 at 9:50 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> > Ben Peart <peartben@xxxxxxxxx> writes:
> >
> >> For more API/state design purity, I wonder if there should be an
> >> object_store structure that is passed to each of the object store APIs
> >> instead of passing the repository object. The repository object could
> >> then contain an instance of the object_store structure.
> >>
> >> That said, I haven't take a close look at all the code in object.c to
> >> see if all the data needed can be cleanly abstracted into an
> >> object_store structure.
> >
> > My gut feeling was it is just the large hashtable that keeps track of
> > objects we have seen, but the object replacement/grafts and other
> > things may also want to become per-repository.
> This is similar to the_index which is referenced by the_repository.
> But as we do not have anything like the_object_store already, we are
> free to design it, as the required work that needs to be put in is the
> same.
> With the object replacements/grafts coming up as well as alternates,
> we definitely want that to be per repository, the question is if we rather
> want
>   the_repository -> many object_stores (one for each, alternate, grafts,
>       and the usual place at $GIT_DIR/objects
>   where the object_store is a hashmap, maybe an additional describing
>   string or path.
> or
>   the_repository -> the_object_store
>   but the object store is a complex beast having different hash tables
>   for the different alternates.

After looking at the patch and some of the comments here I think that
this is probably the best approach with a few tweaks (which may be
completely unfounded because I'm not familiar with all the object store

In an OO world I would envision a single object (let's say 'struct
object_store') which is responsible for managing a repository's objects
no matter where the individual objects came from (main object store or
an alternate for that repository).  And if I understand correctly the
single hash table that exists now caches objects like this.

I also think that such a 'struct object_store' should probably be an
opaque type to a majority of the code base.  This means that it probably
shouldn't have its definition in 'repository.h'.

As far as API, I think it should be similar to the new repo_config (old
one too, though it was implicit) API in that the code base doesn't need
to know about 'struct configset', it just passes a pointer to the
repository and then the 'struct configset' which is stored in the
repository is operated on under the hood.  This way the code base would
just query for an object using the repository as a handle like:

  get_object(repo, OID);

  and not:

  get_object(repo->object_store, OID);

Of course under the hood it would be preferable to have the functions
operate on the object_store struct explicitly.

> or
>   the_repository -> the_object_store_hash_map
>   which is this patch that would try to put any object related to this
>   repository into the same hashmap and the hashmap is not special
>   for each of the different object locations.
> >
> >> One concern I have is that the global state refactoring effort will
> >> just result in all the global state getting moved into a single
> >> (global) repository object thus limiting it's usefulness.

I think we do need to think about this, but it shouldn't be too much of
a concern right now.  The first step is to get enough of the object
store object oriented such that you can have two object stores
corresponding to two different repositories working in parallel.

> >
> > I actually am not worried about it that much, and I say this with
> > the background of having done the same "grouping a set of global
> > state variables into a single structure and turning them into a
> > single default instance" for the_index.  Whether you like it or not,
> > the majority of operations do work on the default instance---that
> > was why the operations could live with just "a set of global state
> > variables" in the first place, and the convenience compatibility
> > macros that allow you to operate on the fields of the default
> > instance as if they were separate variables have been a huge
> > typesaver that also reduces the cognitive burden.  I'd expect that
> > the same will hold for the new "repository" and the "object_store"
> > abstractions.
> Sounds reasonable to expect.
> Thanks,
> Stefan

Brandon Williams