Web lists-archives.com

Re: Finer timestamps and serialization in git




On 5/15/2019 8:28 PM, Eric S. Raymond wrote:
> Derrick Stolee <stolee@xxxxxxxxx>:
>> What problem are you trying to solve where commit date is important?
> 
> I don't know what Jason's are.  I know what mine are.
> 
> A. Portable commit identifiers
> 
> 1. When I in-migrate a repository from (say) Subversion with
> reposurgeon, I want to be able to patch change comments so that (say)
> r2367 becomes a unique reference to its corresponding commit. I do
> not want the kludge of appending a relic SVN-ID header to be *required*,
> though some customers may choose that. Requirung that is an orthogonality
> violation.

Instead of using the free-form nature of a commit message to include links
to an external VCS, you want a first-class data type in Git to provide this
data? Not only is that backwards, it makes the link between the Git repo and
the SVN repo weaker. How would you distinguish between a commit generated from
the old SVN repo and a commit that was created directly in the Git repo without
performing a lookup to the SVN repo based on (committer, timestamp)?

> 2. Because I think in decadal timescales about infrastructure, I want
> my commit references to be in a format that won't break when the history
> is forward-migrated to the *next* VCS. That pretty much eliminates any
> from of opaque hash. (Git itself will have a weaker version of this problem
> when you change hash formats.)
> 
> 3. Accordingly, I invented action stamps. This is an action stamp:
> <esr@xxxxxxxxxxx!2019-05-15T20:01:15Z>. One reason I want timestamp
> uniqueness is for action-stamp uniqueness.

Looks like you have an excellent format for a backwards-facing link.

Gerrit uses a commit-msg hook [1] to insert "Change-Id" tags into
commit messages. You could probably do something similar. If you have
control over _every_ client interacting with the repo, you could even
have this interact with a central authority to give a unique stamp.

> B. Unique canonical form of import-stream representation.
> 
> Reposurgeon is a very complex piece of software with subtle failure
> modes.  I have a strong need to be able to regression-test its
> operation.  Right now there are important cases in which I can't do
> that because (a) the order in which it writes commits and (b) how it
> colors branches, are both phase-of-moon dependent.  That is, the
> algorithms may be deterministic but they're not documented and seem to
> be dependent on variables that are hidden from me.
> 
> Before import streams can have a canonical output order without hidden
> variables (e.g. depending only on visible metadata) in practice, that
> needs to be possible in principle. I've thought about this a lot and
> not only are unique commit timestamps the most natural way to make
> it possible, they're the only way conistent with the reality that
> commit comments may be altered for various good reasons during
> repository translation.

If you are trying to debug or test something, why don't you serialize
the input you are using for your test?

>> P.S. All of my (overly strong) opinions on using commit date are made
>> more valid when you realize anyone can set GIT_COMMITTER_DATE to get
>> an arbitrary commit date.
> 
> In the way I would write things, you can *request* that date, but in
> case of a collision you might actually get one a few microseconds off
> that preserves its order relationship with your other commits.

As mentioned above, you need to make this request at the time the commit
is created, and you'll need to communicate with a central authority. That
goes against the distributed nature of Git.

In my opinion, Git already gives you the flexibility to achieve the goals
you are looking for. But changing a core data type to make your goals
slightly more convenient is not a valuable exercise.

-Stolee

[1] https://gerrit-review.googlesource.com/Documentation/cmd-hook-commit-msg.html