Re: reftable [v5]: new ref storage format
- Date: Mon, 7 Aug 2017 07:41:43 -0700
- From: Shawn Pearce <spearce@xxxxxxxxxxx>
- Subject: Re: reftable [v5]: new ref storage format
On Sun, Aug 6, 2017 at 4:37 PM, Ben Alex <ben.alex@xxxxxxxxxxxx> wrote:
> Just on the LmdbJava specific pieces:
> On Mon, Aug 7, 2017 at 8:56 AM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote:
>> Looks pretty complete. Its a Java wrapper around the C implementation
>> of LMDB, which may be sufficient for reference storage. Keys are
>> limited to 511 bytes, so insanely long reference names would have to
>> be rejected. Reftable allows reference names up to the file's
>> `page_size`, minus overhead (~15 bytes) and value (20 bytes).
> For clarification LmdbJava code doesn't enforce a particular key size limit.
> For puts the caller nominates the size in the buffer they present for
> storage, and for get-style operations (cursors etc) the LMDB database stores
> the key size and LmdbJava adjusts the Java-visible buffer accordingly.
> A 511 byte key limit is specified at compile time for the native LMDB
> library. For convenience the native library is compiled for 64-bit Windows,
> Linux and Mac OS and included in the LmdbJava JAR, and this compilation is
> performed using default values (including the 511 key limit) by the
> https://github.com/lmdbjava/native project. Users can specify a different
> native library to use (eg one packaged by their OS or separately compiled
> using an LmdbJava Native-like automatic build) with a larger key size if
> they wish.
> As such if JGit wanted to use a longer key size, it is possible to implement
> similar automatic builds and packaging into JGit.
I don't know if we need a larger key size. $DAY_JOB limits ref names
to ~200 bytes in a hook. I think GitHub does similar. But I'm worried
about the general masses who might be using our software and expect
ref names thus far to be as long as PATH_MAX on their system. Most
systems run PATH_MAX around 1024.
The limitation of needing native JARs, and having such a low compile
time constant, may be annoying to some.
>> A downside for JGit is getting these two open source projects cleared.
>> We would have to get approval from our sponsor (Eclipse Foundation) to
>> use both lmdbjava (Apache License) and LMDB (LMDB license).
> I can't speak for the other contributors, but I'm happy to review LmdbJava's
> license if this assisted. For example changing to the OpenLDAP License would
> seem a reasonable variation given users of LmdbJava already need to accept
> the OpenLDAP License to use it. Kristoffer, do you have thoughts on this?
Thanks for considering it, but please don't change your licensing just
because of JGit. Its unlikely we can use LMDB for a lot of technical
>> Plus it
>> looks like lmdbjava still relies on local disk and isn't giving us a
>> way to patch in a virtual filesystem the way I need to at $DAY_JOB.
> LMDB's mdb_env_open requires a const char* path, so we can pass through any
> char array desired. But I think you'll find LMDB native can't map to a
> virtual file system implemented by JVM code (the LMDB caveats section has
> further local file system considerations).
Mostly at $DAY_JOB its because we can't virtualize the filesystem
calls the C library is doing.
In git-core, I'm worried about the caveats related to locking. Git
tries to work nicely on NFS, and it seems LMDB wouldn't. Git also runs
fine on a read-only filesystem, and LMDB gets a little weird about
that. Finally, Git doesn't have nearly the risks LMDB has about a
crashed reader or writer locking out future operations until the locks
have been resolved. This is especially true with shared user
repositories, where another user might setup and own the semaphore.