Re: [RFC PATCH] git-submodule.sh:cmd_update: if submodule branch exists, fetch that instead of default
- Date: Thu, 8 Mar 2018 21:41:24 +0200
- From: Eddy Petrișor <eddy.petrisor@xxxxxxxxx>
- Subject: Re: [RFC PATCH] git-submodule.sh:cmd_update: if submodule branch exists, fetch that instead of default
2018-03-08 21:29 GMT+02:00 Eddy Petrișor <eddy.petrisor@xxxxxxxxx>:
> 2018-03-06 22:21 GMT+02:00 Jonathan Nieder <jrnieder@xxxxxxxxx>:
>> (cc list snipped)
>> Eddy Petrișor wrote:
>> > Cc: [a lot of people]
>> Can you say a little about how this cc list was created? E.g. should
>> "git send-email" get a feature to warn about long cc lists?
> I did it as advised by the documentation, git blame:
> I was suprised that there is no specialized script that does this, as
> it is for the kernel or u-boot, so I ran first
> git log --pretty=format:'%an <%ae>' git-submodule.sh | sort -u >
> then configure git to use that custom output and ignore the patch
> since it was trying to convert every line of it into a email address:
> git config sendemail.ccCmd 'cat mail-list-submodule # '
> Then "git send-email 0001..." did the rest.
>> > Signed-off-by: Eddy Petrișor <eddy.petrisor@xxxxxxxxx>
>> > ---
>> > There are projects such as llvm/clang which use several repositories, and they
>> > might be forked for providing support for various features such as adding Redox
>> > awareness to the toolchain. This typically means the superproject will use
>> > another branch than master, occasionally even use an old commit from that
>> > non-master branch.
>> > Combined with the fact that when incorporating such a hierachy of repositories
>> > usually the user is interested in just the exact commit specified in the
>> > submodule info, it follows that a desireable usecase is to be also able to
>> > provide '--depth 1' to avoid waiting for ages for the clone operation to
>> > finish.
>> Some previous discussion is at
>> In theory this should be straightforward: Git protocol allows fetching
>> an arbitrary commit, so "git submodule update" and similar commands
>> could fetch the submodule commit by SHA-1 instead of by refname. Poof!
>> Problem gone.
>> In practice, some complications:
>> - some servers do not permit fetch-by-sha1. For example, github does
>> not permit it. This is governed by the
>> uploadpack.allowReachableSHA1InWant / uploadpack.allowAnySHA1InWant
>> configuration items.
> That is the problem I have faced since I tried to clone this repo
> which has at lest 2 levels of submodules:
> The problematic modules are:
> rust @ https://github.com/redox-os/rust/tree/81c2bf4e51647295d3d92952dbb0464b460df0c3
> - first level
> rust/src/llvm @
>> That should be surmountable by making the behavior conditional, but
>> it's a complication.
> Which forced me to try to fetch a branch on which that commit exists,
> but also consider providing --depth for the submodule checkout,
> effectively minimizing the amount of brought in history.
>> - When the user passes --depth=<num>, do they mean that to apply to
>> the superproject, to the submodules, or both? Documentation should
>> make the behavior clear.
> The supermodule clone already exists and that works OK; I was after
> providing something like 'git submodule update --depth 20 --recursive'
> or evn providing that 'depth' info via the .gitmodules parameters
> since 'shallow' is already used somehow, yet that is a bool value, not
> an integer, like depth expects:
> eddy@feodora:~/usr/src/redox/rust-eddy$ git config -f .gitmodules
> --list | egrep '(depth|shallow)'
>> > Git submodule seems to be very stubborn and cloning master, although the
>> > wrapper script and the gitmodules-helper could work together to clone directly
>> > the branch specified in the .gitmodules file, if specified.
>> This could make sense. For the same reason as --depth in the
>> superproject gives ambiguous signals about what should happen in
>> submodules, --single-branch in the superproject gives ambiguous
>> signals about what branch to fetch in submodules.
> I never thought of providing the branch in the command line, since
> that's not versionable info, but to provide git-submodule a hint in
> the .gitsubmodule config about on which branch the hash in question
> should be found, like this:
> eddy@feodora:~/usr/src/redox/rust-eddy$ git config -f .gitmodules
> --list | egrep branch
>> > Another wrinkle is that when the commit is not the tip of the branch, the depth
>> > parameter should somehow be stored in the .gitmodules info, but any change in
>> > the submodule will break the supermodule submodule depth info sooner or later,
>> > which is definitly frigile.
>> Hm, this seems to go in another direction. I don't think we should
>> store the depth parameter in the .gitmodules file, since different
>> users are likely to have different preferences about what to make
>> shallow. If we make --depth easy enough to use at the superproject
>> level then the user can specify what they want there.
> Yes, but
> a) if the commit is not on the tip of that branch,
> b) the server does not support sha1 fetching and
> c) it is not expected to typically work in the submodules from the
> superproject checkout (e.g submodule is under another team's control)
> the typical usecase could be that the code is question is read-only
> and it should be OK to typically fetch a shallow clone of the
> submodule (by default). The only compromise at that point is to either
> choose a afe margin for the defalt depth in case the module branch is
> being modified.
> I solved a similar problem at work where having 2 shallow branches
> which had to be merged; since when trying to merge two branches
> without apprent common history git will do a merge commit, when a
> simple ff woudl wor, I wanted to find the merge-base.
> SInce the brances were shallow and they had to have a merge-base, the
> solution was to was to simply increase the --depth parameter in
> predifined steps (of 10 commits) and fetch both branches, check if the
> 'git merge-base' command manged to find a commit; if not the depthc
> parameter woudl be increased and the whole process would be started
> until that merge-base was found.
Another optimization could be for submodule.<path>.shallow to imply
> I wonder if the same couldn't be done for the submodules that have
> submodule.<dir>.shallow setting, making the depth parameter hardcoding
> unnecessary, and guranteeing the submodule could be cloned at any
> given time, as long as the branch specified by submodule.<dir>.branch
> is not removed from the submodule repo.
>> > I tried digging into this section of the code and debugging with bashdb to see
>> > where --depth might fit, but I got stuck on the shell-to-helper interaction and
>> > the details of the submodule implementation, so I want to lay out this first
>> > patch as starting point for the discussion in the hope somebody else picks it
>> > up or can provide some inputs. I have the feeling there are multiple code paths
>> > that are being ran, depending on the moment (initial clone, submodule
>> > recursive, post-clone update etc.) and I have a gut feeling there shouldn't be
>> > any code duplication just because the operation is different.
>> > This first patch is only trying to use a non-master branch, I have some changes
>> > for the --depth part, but I stopped working on it due to the "default depth"
>> > issue above.
>> > Does any of this sound reasonable?
>> > Is this patch idea usable or did I managed to touch the part of the code that
>> > should not be touched?
>> I agree with the goal. As mentioned above, I'm not confident about
>> the particular mechanism --- e.g. something using fetch-by-sha1 seems
>> likely to be more intuitive.
>> Today, the 'branch' setting in .gitmodules is only for "git submodule
>> update --remote". This patch would be a significant expansion in
>> scope for it. Hopefully others on the list can talk more about how
>> that fits into various workflows and whether it would work out well.
> Thanks for the input, it was helpful anyway.
> Now that I think of it, I think the "fetch more and more from the
> branch until we find the sha1" could work; in the worst case you would
> end up going to the beginning of the repo in case the submodule
> history was rewritten or if the branch specified in the supermodule
> submodule.<path>.branch was removed.
Actually, the branch removal in the submodule would lead to a failure
to fetch from the get go, so the worst case is if history is rewritten
because the sha1 would be missing fromt he branch.
Antoher non-fatal perfomance issue would be if the desired commit is
very close to the initial commit of the repo.
> I am aware this is not ellegant at all (what is the default "step"?),
> yet the submodule part of the code already struggles in that area
> because it clearly looks like an afterthought, anyway.