Re: Finer timestamps and serialization in git
- Date: Mon, 20 May 2019 13:22:15 -0400
- From: Derrick Stolee <stolee@xxxxxxxxx>
- Subject: Re: Finer timestamps and serialization in git
On 5/20/2019 12:36 PM, Eric S. Raymond wrote:
> Michal Suchánek <msuchanek@xxxxxxx>:
>> On Wed, 15 May 2019 21:25:46 -0400
>> Derrick Stolee <stolee@xxxxxxxxx> wrote:
>>> On 5/15/2019 8:28 PM, Eric S. Raymond wrote:
>>>> Derrick Stolee <stolee@xxxxxxxxx>:
>>>>> What problem are you trying to solve where commit date is important?
>>>> B. Unique canonical form of import-stream representation.
>>>> Reposurgeon is a very complex piece of software with subtle failure
>>>> modes. I have a strong need to be able to regression-test its
>>>> operation. Right now there are important cases in which I can't do
>>>> that because (a) the order in which it writes commits and (b) how it
>>>> colors branches, are both phase-of-moon dependent. That is, the
>>>> algorithms may be deterministic but they're not documented and seem to
>>>> be dependent on variables that are hidden from me.
>>>> Before import streams can have a canonical output order without hidden
>>>> variables (e.g. depending only on visible metadata) in practice, that
>>>> needs to be possible in principle. I've thought about this a lot and
>>>> not only are unique commit timestamps the most natural way to make
>>>> it possible, they're the only way conistent with the reality that
>>>> commit comments may be altered for various good reasons during
>>>> repository translation.
>>> If you are trying to debug or test something, why don't you serialize
>>> the input you are using for your test?
>> And that's the problem. Serialization of a git repository is not stable
>> because there is no total ordering on commits. And for testing you need
>> to serialize some 'before' and 'after' state and they can be totally
>> different. Not because the repository state is totally different but
>> because the serialization of the state is not stable.
> Yes, msuchanek is right - that is exactly the problem. Very well put.
> git fast-import streams *are* the serialization; they're what reposurgeon
> ingests and emits. The concrete problem I have is that there is no stable
> correspondence between a repository and one canonical fast-import
> serialization of it.
> That is a bigger pain in the ass than you will be able to imagine unless
> and until you try writing surgical tools yourself and discover that you
> can't write tests for them.
What it sounds like you are doing is piping a 'git fast-import' process into
reposurgeon, and testing that reposurgeon does the same thing every time.
Of course this won't be consistent if 'git fast-import' isn't consistent.
But what you should do instead is store a fixed file from one run of
'git fast-import' and send that file to reposurgeon for the repeated test.
Don't rely on fast-import being consistent and instead use fixed input for
If reposurgeon is providing the input to _and_ consuming the output from
'git fast-import', then yes you will need to have at least one integration
test that runs the full pipeline. But for regression tests covering complicated
logic in reposurgeon, you're better off splitting the test (or mocking out
'git fast-import' with something that provides consistent output given