Web lists-archives.com

Re: [RFC PATCH] git-submodule.sh:cmd_update: if submodule branch exists, fetch that instead of default

2018-03-08 21:29 GMT+02:00 Eddy Petrișor <eddy.petrisor@xxxxxxxxx>:
> 2018-03-06 22:21 GMT+02:00 Jonathan Nieder <jrnieder@xxxxxxxxx>:
>> (cc list snipped)
>> Hi,
>> Eddy Petrișor wrote:
>> > Cc: [a lot of people]
>> Can you say a little about how this cc list was created?  E.g. should
>> "git send-email" get a feature to warn about long cc lists?
> I did it as advised by the  documentation, git blame:
> https://github.com/git/git/blob/master/Documentation/SubmittingPatches#L264
> I was suprised that there is no specialized script that does this, as
> it is for the kernel or u-boot, so I ran first
>     git log --pretty=format:'%an <%ae>' git-submodule.sh | sort -u >
> mail-list-submodule
> then configure git to use that custom output and ignore the patch
> since it was trying to convert every line of it into a email address:
>     git config sendemail.ccCmd 'cat mail-list-submodule # '
> Then "git send-email 0001..." did the rest.
>> > Signed-off-by: Eddy Petrișor <eddy.petrisor@xxxxxxxxx>
>> > ---
>> >
>> > There are projects such as llvm/clang which use several repositories, and they
>> > might be forked for providing support for various features such as adding Redox
>> > awareness to the toolchain. This typically means the superproject will use
>> > another branch than master, occasionally even use an old commit from that
>> > non-master branch.
>> >
>> > Combined with the fact that when incorporating such a hierachy of repositories
>> > usually the user is interested in just the exact commit specified in the
>> > submodule info, it follows that a desireable usecase is to be also able to
>> > provide '--depth 1' to avoid waiting for ages for the clone operation to
>> > finish.
>> Some previous discussion is at
>> https://public-inbox.org/git/CAGZ79ka6UXKyVLmdLg_M5-sB1x96g8FRzRZy=ENy5aJBQf9_QA@xxxxxxxxxxxxxx/.
>> In theory this should be straightforward: Git protocol allows fetching
>> an arbitrary commit, so "git submodule update" and similar commands
>> could fetch the submodule commit by SHA-1 instead of by refname.  Poof!
>> Problem gone.
>> In practice, some complications:
>>  - some servers do not permit fetch-by-sha1.  For example, github does
>>    not permit it.  This is governed by the
>>    uploadpack.allowReachableSHA1InWant / uploadpack.allowAnySHA1InWant
>>    configuration items.
> That is the problem I have faced since I tried to clone this repo
> which has at lest 2 levels of submodules:
> https://github.com/redox-os/redox
> The problematic modules are:
> rust @ https://github.com/redox-os/rust/tree/81c2bf4e51647295d3d92952dbb0464b460df0c3
> - first level
> and
> rust/src/llvm @
> https://github.com/rust-lang/llvm/tree/ba2edd794c7def715007931fcd1b4ce62aa711c8
>>    That should be surmountable by making the behavior conditional, but
>>    it's a complication.
> Which forced me to try to fetch a branch on which that commit exists,
> but also consider providing --depth for the submodule checkout,
> effectively minimizing the amount of brought in history.
>>  - When the user passes --depth=<num>, do they mean that to apply to
>>    the superproject, to the submodules, or both?  Documentation should
>>    make the behavior clear.
> The supermodule clone already exists and that works OK; I was after
> providing something like 'git submodule update --depth 20 --recursive'
> or evn providing that 'depth' info via the .gitmodules parameters
> since 'shallow' is already used somehow, yet that is a bool value, not
> an integer, like depth expects:
> eddy@feodora:~/usr/src/redox/rust-eddy$ git config -f .gitmodules
> --list | egrep '(depth|shallow)'
> submodule.src/llvm.shallow=true
> submodule.src/rt/hoedown.shallow=true
> submodule.src/jemalloc.shallow=true
> submodule.src/liblibc.shallow=true
> submodule.src/doc/nomicon.shallow=true
> submodule.src/tools/cargo.shallow=true
> submodule.src/doc/reference.shallow=true
> submodule.src/doc/book.shallow=true
> submodule.src/tools/rls.shallow=true
> submodule.src/libcompiler_builtins.shallow=true
> submodule.src/tools/clippy.shallow=true
> submodule.src/tools/rustfmt.shallow=true
> submodule.src/tools/miri.shallow=true
> submodule.src/dlmalloc.shallow=true
> submodule.src/binaryen.shallow=true
> submodule.src/doc/rust-by-example.shallow=true
> submodule.src/llvm-emscripten.shallow=true
> submodule.src/tools/rust-installer.shallow=true
>> > Git submodule seems to be very stubborn and cloning master, although the
>> > wrapper script and the gitmodules-helper could work together to clone directly
>> > the branch specified in the .gitmodules file, if specified.
>> This could make sense.  For the same reason as --depth in the
>> superproject gives ambiguous signals about what should happen in
>> submodules, --single-branch in the superproject gives ambiguous
>> signals about what branch to fetch in submodules.
> I never thought of providing the branch in the command line, since
> that's not versionable info, but to provide git-submodule a hint in
> the .gitsubmodule config about on which branch the hash in question
> should be found, like this:
> eddy@feodora:~/usr/src/redox/rust-eddy$ git config -f .gitmodules
> --list | egrep branch
> submodule.src/llvm.branch=rust-llvm-release-4-0-1
> submodule.src/rt/hoedown.branch=rust-2015-09-21-do-not-delete
>> > Another wrinkle is that when the commit is not the tip of the branch, the depth
>> > parameter should somehow be stored in the .gitmodules info, but any change in
>> > the submodule will break the supermodule submodule depth info sooner or later,
>> > which is definitly frigile.
>> Hm, this seems to go in another direction.  I don't think we should
>> store the depth parameter in the .gitmodules file, since different
>> users are likely to have different preferences about what to make
>> shallow.  If we make --depth easy enough to use at the superproject
>> level then the user can specify what they want there.
> Yes, but
> a) if the commit is not on the tip of that branch,
> b) the server does not support sha1 fetching and
> c) it is not expected to typically work in the submodules from the
> superproject checkout (e.g submodule is under another team's control)
> the typical usecase could be that the code is question is read-only
> and it should be OK to typically fetch a shallow clone of the
> submodule (by default). The only compromise at that point is to either
> choose a afe margin for the defalt depth in case the module branch is
> being modified.
> I solved a similar problem at work where having 2 shallow branches
> which had to be merged; since when trying to merge two branches
> without apprent common history git will do a merge commit, when a
> simple ff woudl wor, I wanted to find the merge-base.
> SInce the brances were shallow and they had to have a merge-base, the
> solution was to was to simply increase the --depth parameter in
> predifined steps (of 10 commits) and fetch both branches, check if the
> 'git merge-base'  command manged to find a commit; if not the depthc
> parameter woudl be increased and the whole process would be started
> until that merge-base was found.

Another optimization could be for submodule.<path>.shallow to imply
--single-branch, too.

> I wonder if the same couldn't be done for the submodules that have
> submodule.<dir>.shallow setting, making the depth parameter hardcoding
> unnecessary, and guranteeing the submodule could be cloned at any
> given time, as long as the branch specified by submodule.<dir>.branch
> is not removed from the submodule repo.
>> > I tried digging into this section of the code and debugging with bashdb to see
>> > where --depth might fit, but I got stuck on the shell-to-helper interaction and
>> > the details of the submodule implementation, so I want to lay out this first
>> > patch as starting point for the discussion in the hope somebody else picks it
>> > up or can provide some inputs. I have the feeling there are multiple code paths
>> > that are being ran, depending on the moment (initial clone, submodule
>> > recursive, post-clone update etc.) and I have a gut feeling there shouldn't be
>> > any code duplication just because the operation is different.
>> >
>> > This first patch is only trying to use a non-master branch, I have some changes
>> > for the --depth part, but I stopped working on it due to the "default depth"
>> > issue above.
>> >
>> > Does any of this sound reasonable?
>> > Is this patch idea usable or did I managed to touch the part of the code that
>> > should not be touched?
>> I agree with the goal.  As mentioned above, I'm not confident about
>> the particular mechanism --- e.g. something using fetch-by-sha1 seems
>> likely to be more intuitive.
>> Today, the 'branch' setting in .gitmodules is only for "git submodule
>> update --remote".  This patch would be a significant expansion in
>> scope for it.  Hopefully others on the list can talk more about how
>> that fits into various workflows and whether it would work out well.
> Thanks for the input, it was helpful anyway.
> Now that I think of it, I think the "fetch more and more from the
> branch until we find the sha1" could work; in the worst case you would
> end up going to the beginning of the repo in case the submodule
> history was rewritten or if the branch specified in the supermodule
> submodule.<path>.branch was removed.

Actually, the branch removal in the submodule would lead to a failure
to fetch from the get go, so the worst case is if history is rewritten
because the sha1 would be missing fromt he branch.
Antoher non-fatal perfomance issue would be if the desired commit is
very close to the initial commit of the repo.

> I am aware this is not ellegant at all (what is the default "step"?),
> yet the submodule part of the code already struggles in that area
> because it clearly looks like an afterthought, anyway.

Eddy Petrișor