[PATCH v5 12/16] partial-clone: add multiple remotes in the doc
- Date: Tue, 9 Apr 2019 18:11:12 +0200
- From: Christian Couder <christian.couder@xxxxxxxxx>
- Subject: [PATCH v5 12/16] partial-clone: add multiple remotes in the doc
While at it, let's remove a reference to ODB effort as the ODB
effort has been replaced by directly enhancing partial clone
and promisor remote features.
Signed-off-by: Christian Couder <chriscool@xxxxxxxxxxxxx>
Documentation/technical/partial-clone.txt | 117 ++++++++++++++++------
1 file changed, 84 insertions(+), 33 deletions(-)
diff --git a/Documentation/technical/partial-clone.txt b/Documentation/technical/partial-clone.txt
index 896c7b3878..210373e258 100644
@@ -30,12 +30,20 @@ advance* during clone and fetch operations and thereby reduce download
times and disk usage. Missing objects can later be "demand fetched"
+A remote that can later provide the missing objects is called a
+promisor remote, as it promises to send the objects when
+requested. Initialy Git supported only one promisor remote, the origin
+remote from which the user cloned and that was configured in the
+"extensions.partialClone" config option. Later support for more than
+one promisor remote has been implemented.
Use of partial clone requires that the user be online and the origin
-remote be available for on-demand fetching of missing objects. This may
-or may not be problematic for the user. For example, if the user can
-stay within the pre-selected subset of the source tree, they may not
-encounter any missing objects. Alternatively, the user could try to
-pre-fetch various objects if they know that they are going offline.
+remote or other promisor remotes be available for on-demand fetching
+of missing objects. This may or may not be problematic for the user.
+For example, if the user can stay within the pre-selected subset of
+the source tree, they may not encounter any missing objects.
+Alternatively, the user could try to pre-fetch various objects if they
+know that they are going offline.
@@ -100,18 +108,18 @@ or commits that reference missing trees.
Handling Missing Objects
-- An object may be missing due to a partial clone or fetch, or missing due
- to repository corruption. To differentiate these cases, the local
- repository specially indicates such filtered packfiles obtained from the
- promisor remote as "promisor packfiles".
+- An object may be missing due to a partial clone or fetch, or missing
+ due to repository corruption. To differentiate these cases, the
+ local repository specially indicates such filtered packfiles
+ obtained from promisor remotes as "promisor packfiles".
These promisor packfiles consist of a "<name>.promisor" file with
arbitrary contents (like the "<name>.keep" files), in addition to
their "<name>.pack" and "<name>.idx" files.
- The local repository considers a "promisor object" to be an object that
- it knows (to the best of its ability) that the promisor remote has promised
- that it has, either because the local repository has that object in one of
+ it knows (to the best of its ability) that promisor remotes have promised
+ that they have, either because the local repository has that object in one of
its promisor packfiles, or because another promisor object refers to it.
When Git encounters a missing object, Git can see if it is a promisor object
@@ -123,12 +131,12 @@ expensive-to-modify list of missing objects.[a]
- Since almost all Git code currently expects any referenced object to be
present locally and because we do not want to force every command to do
a dry-run first, a fallback mechanism is added to allow Git to attempt
- to dynamically fetch missing objects from the promisor remote.
+ to dynamically fetch missing objects from promisor remotes.
When the normal object lookup fails to find an object, Git invokes
-fetch-object to try to get the object from the server and then retry
-the object lookup. This allows objects to be "faulted in" without
-complicated prediction algorithms.
+promisor_remote_get_direct() to try to get the object from a promisor
+remote and then retry the object lookup. This allows objects to be
+"faulted in" without complicated prediction algorithms.
For efficiency reasons, no check as to whether the missing object is
actually a promisor object is performed.
@@ -157,8 +165,7 @@ and prefetch those objects in bulk.
We are not happy with this global variable and would like to remove it,
but that requires significant refactoring of the object code to pass an
-additional flag. We hope that concurrent efforts to add an ODB API can
Fetching Missing Objects
@@ -182,21 +189,63 @@ has been updated to not use any object flags when the corresponding argument
though they are not necessary.
+Using many promisor remotes
+Many promisor remotes can be configured and used.
+This allows for example a user to have multiple geographically-close
+cache servers for fetching missing blobs while continuing to do
+filtered `git-fetch` commands from the central server.
+When fetching objects, promisor remotes are tried one after the other
+until all the objects have been fetched.
+Remotes that are considered "promisor" remotes are those specified by
+the following configuration variables:
+- `extensions.partialClone = <name>`
+- `remote.<name>.promisor = true`
+- `remote.<name>.partialCloneFilter = ...`
+Only one promisor remote can be configured using the
+`extensions.partialClone` config variable. This promisor remote will
+be the last one tried when fetching objects.
+We decided to make it the last one we try, because it is likely that
+someone using many promisor remotes is doing so because the other
+promisor remotes are better for some reason (maybe they are closer or
+faster for some kind of objects) than the origin, and the origin is
+likely to be the remote specified by extensions.partialClone.
+This justification is not very strong, but one choice had to be made,
+and anyway the long term plan should be to make the order somehow
+For now though the other promisor remotes will be tried in the order
+they appear in the config file.
-- The remote used for a partial clone (or the first partial fetch
- following a regular clone) is marked as the "promisor remote".
+- It is not possible to specify the order in which the promisor
+ remotes are tried in other ways than the order in which they appear
+ in the config file.
-We are currently limited to a single promisor remote and only that
-remote may be used for subsequent partial fetches.
+It is also not possible to specify an order to be used when fetching
+from one remote and a different order when fetching from another
+- It is not possible to push only specific objects to a promisor
-We accept this limitation because we believe initial users of this
-feature will be using it on repositories with a strong single central
+It is not possible to push at the same time to multiple promisor
+remote in a specific order.
-- Dynamic object fetching will only ask the promisor remote for missing
- objects. We assume that the promisor remote has a complete view of the
+- Dynamic object fetching will only ask promisor remotes for missing
+ objects. We assume that promisor remotes have a complete view of the
repository and can satisfy all such requests.
- Repack essentially treats promisor and non-promisor packfiles as 2
@@ -218,15 +267,17 @@ server.
-- Allow more than one promisor remote and define a strategy for fetching
- missing objects from specific promisor remotes or of iterating over the
- set of promisor remotes until a missing object is found.
+- Improve the way to specify the order in which promisor remotes are
-A user might want to have multiple geographically-close cache servers
-for fetching missing blobs while continuing to do filtered `git-fetch`
-commands from the central server, for example.
+For example this could allow to specify explicitly something like:
+"When fetching from this remote, I want to use these promisor remotes
+in this order, though, when pushing or fetching to that remote, I want
+to use those promisor remotes in that order."
+- Allow pushing to promisor remotes.
-Or the user might want to work in a triangular work flow with multiple
+The user might want to work in a triangular work flow with multiple
promisor remotes that each have an incomplete view of the repository.
- Allow repack to work on promisor packfiles (while keeping them distinct