Web lists-archives.com

Re: reftable [v5]: new ref storage format




On Sun, Aug 06 2017, Shawn Pearce jotted:

> 5th iteration of the reftable storage format.

I haven't kept up with all of the discussion, sorry if these comments
repeat something that's already mentioned.

> ### Version 1
>
> A repository must set its `$GIT_DIR/config` to configure reftable:
>
>     [core]
>         repositoryformatversion = 1
>     [extensions]
>         reftable = true

David Turner's LMDB proposal specified a extensions.refStorage config
variable instead. I think this is a much better idea, cf. the mistake we
already made with grep.extendedRegexp & grep.patternType. I.e. to have
'extensions.refStorage = reftable' instead of 'extensions.reftable =
true'.

If we grow another storage backend this'll become messy, and it won't be
obvious to the user that the configuration is mutually exclusive (which
it surely will be), so we'll end up having to special case it similar to
the grep.[extendedRegexp,patternType] (i.e. either make one override the
other, or make specifying >1 an error, a hassle with the config API).

> Performance testing indicates reftable is faster for lookups (51%
> faster, 11.2 usec vs.  5.4 usec), although reftable produces a
> slightly larger file (+ ~3.2%, 28.3M vs 29.2M):
>
> format    |  size  | seek cold | seek hot  |
> ---------:|-------:|----------:|----------:|
> mh-alt    | 28.3 M | 23.4 usec | 11.2 usec |
> reftable  | 29.2 M | 19.9 usec |  5.4 usec |
>
> [mh-alt]: https://public-inbox.org/git/CAMy9T_HCnyc1g8XWOOWhe7nN0aEFyyBskV2aOMb_fe+wGvEJ7A@xxxxxxxxxxxxxx/

Might be worth noting "based on WIP Java implementation". I started
searching for patches for this new format & found via
<CAJo=hJtrdCOF-RxzXfyLx7R-1f2-7pZVO_UOg28J=wUDNdf3yw@xxxxxxxxxxxxxx>
that it's JGit only.

Also if one wanted to run these tests via JGit using your WIP code where
does that code live / how to test it?

> ### LMDB
>
> David Turner proposed [using LMDB][dt-lmdb], as LMDB is lightweight
> (64k of runtime code) and GPL-compatible license.
>
> A downside of LMDB is its reliance on a single C implementation.  This
> makes embedding inside JGit (a popular reimplemenation of Git)
> difficult, and hoisting onto virtual storage (for JGit DFS) virtually
> impossible.

This rationale as stated reads a bit too much like https://xkcd.com/927/

I.e. surely the actual problem isn't that there's a single C
implementation of LMDB, since that's one more than the C implementation
that exists of this new format already.

Also isn't this info out of date now that this exists:
https://github.com/lmdbjava/lmdbjava ? That project has been implemented
after David's initial LMDB patches on-list, but I don't know if it
implements the subset of the LMDB format needed for his proposed ref
storage.

But rather something like:

    A downside of LMDB is that it would be too complex to implement the
    subset of its database format needed for this reference storage in
    Java in the nascent lmdbjava project and to keep the two compatible
    going forward while juggling support for two upstream projects whose
    aims may conflict with ours.

Or:

    A downside of LMDB is <above rationale> + even if we did that
    benchmarks <do we have those?> show that it wouldn't be worth it to
    use the LMDB format since it's slower/bigger/whatever.

> A common format that can be supported by all major Git implementations
> (git-core, JGit, libgit2) is strongly preferred.
>
> [dt-lmdb]: https://public-inbox.org/git/1455772670-21142-26-git-send-email-dturner@xxxxxxxxxxxxxxxx/
>
> ## Future
>
> ### Longer hashes
>
> Version will bump (e.g.  2) to indicate `value` uses a different
> object id length other than 20.  The length could be stored in an
> expanded file header, or hardcoded as part of the version.