Web lists-archives.com

Re: git, monorepos, and access control




On Wed, Dec 05 2018, Coiner, John wrote:

> I'm an engineer with AMD. I'm looking at whether we could switch our
> internal version control to a monorepo, possibly one based on git and
> VFSForGit.
>
> One obstacle to moving AMD to git/VFSForGit is the lack of access
> control support in git. AMD has a lot of data whose distribution must be
> limited. Sometimes it's a legal requirement, eg. CPU core designs are
> covered by US export control laws and not all employees may see them.
> Sometimes it's a contractual obligation, as when a third party shares
> data with us and we agree only to share this data with certain
> employees. Any hypothetical AMD monorepo should be able to securely deny
> read access in certain subtrees to users without required permissions.
>
> Has anyone looked at adding access control to git, at a per-directory
> granularity? Is this a feature that the git community would possibly
> welcome?
>
> Here's my rough thinking about how it might work:
>   - an administrator can designate that a tree object requires zero or
> more named privileges to read
>   - when a mortal user attempts to retrieve the tree object, a hook
> allows the server to check if the user has a given privilege. The hook
> can query an arbitrary user/group data base, LDAP or whatever. The
> details of this check are mostly in the hook; git only knows about
> abstract named privileges.
>   - if the user has permission, everything goes as normal.
>   - if the user lacks permission, they get a DeniedTree object which
> might carry some metadata about what permissions would be needed to see
> more. The DeniedTree lacks the real tree's entries. (TBD, how do we
> render a denied tree in the workspace? An un-writable directory
> containing only a GITDENIED file with some user friendly error message?)
>   - hashes are secret. If the hashes from a protected tree leak, the
> data also leaks. No check on the server prevents it from handing out
> contents for correctly-guessed hashes.
>   - mortal users shouldn't be able to alter permissions. Of course,
> mortal users will often modify tree objects that carry permissions. So
> the server should enforce that a user isn't pushing updates that alter
> permissions on the same logical directory.
>
> I would welcome your feedback on whether this idea makes technical
> sense, and whether the feature could ever be a fit for git.
>
> You might ask what alternatives we are looking at. At our scale, we'd
> really want a version control system that implements a virtual
> filesystem. That already limits us to ClearCase, VFSForGit, and maybe
> Vesta among public ones.  Am I missing any? We would also want one that
> permits branching enormous numbers of files without creating enormous
> amounts of data in the repo -- git gets that right, and perforce (our
> status quo) does not. That's how I got onto the idea of adding read
> authorization to git.

All of what you've described is possible to implement in git, but as far
as I know there's no existing implementation of it.

Microsoft's GVFS probably comes closest, and they're actively
upstreaming bits of that, but as far as I know that doesn't in any way
try to solve this "users XYZ can't even list such-and-such a tree"
problem.

Google has much the same problem with Android, i.e. there's some parts
of the giant checkout of multiple repos the "repo" tool manages that
aren't available to some of Google's employees, or only to some partners
etc. They solve this by splitting those at repo boundaries, and are
working on moving that to submodules. Perhaps this would work in your
case? It's not as easy to work with, but could be made to work with
existing software.

In case you haven't, read "SECURITY" in the git-fetch(1) manpage,
although if that could be fixed to work for the case you're describing.

In a hypothetical implementation that does this, you are always going to
need to have something where you hand off the SHA-1 of the
"secret-stuff-here/" top-level tree object to clients that can't access
any of the sub-trees or blobs from that directory, since they're going
to need it to construct a commit object consisting of
"stuff-they-can-access/" plus the current state of "secret-stuff-here/"
to push upstream. But you can hide the blobs & trees under
"secret-stuff-here/" and their SHA-1s.

Note also that in the general case your "is this blob secret to user
xyz" or "is this tree secret to user xyz" will need to deal with there
happening to be some blob or tree in "secret-stuff-here/" that just so
happens to share the exact same tree/blob as something they need under
"stuff-they-can-access/". Think a directory that only contains a
".gitignore" file, or a BSD "LICENSE" file.