Web lists-archives.com

Re: en/fast-export-encoding, was Re: What's cooking in git.git (May 2019, #01; Thu, 9)




Hi Hannes & Elijah,

On Fri, 10 May 2019, Johannes Sixt wrote:

> Am 10.05.19 um 02:14 schrieb Elijah Newren:
> > Hi Johannes,
> >
> > On Thu, May 9, 2019 at 1:46 PM Johannes Schindelin
> > <Johannes.Schindelin@xxxxxx> wrote:
> >>
> >> Hi Junio & Elijah,
> >>
> >> On Thu, 9 May 2019, Junio C Hamano wrote:
> >>
> >>> * en/fast-export-encoding (2019-05-07) 5 commits
> >>>  - fast-export: do automatic reencoding of commit messages only if requested
> >>>  - fast-export: differentiate between explicitly utf-8 and implicitly utf-8
> >>>  - fast-export: avoid stripping encoding header if we cannot reencode
> >>>  - fast-import: support 'encoding' commit header
> >>>  - t9350: fix encoding test to actually test reencoding
> >>>
> >>>  The "git fast-export/import" pair has been taught to handle commits
> >>>  with log messages in encoding other than UTF-8 better.
> >>
> >> This breaks on Windows, see
> >> https://dev.azure.com/gitgitgadget/git/_build/results?buildId=8298&view=ms.vss-test-web.build-test-results-tab
> >>
> >> Sadly, I ran out of time looking at it in detail.
> >
> > Thanks for the heads up, and for taking some time to check it out.
> > The error doesn't seem obvious from the log.  Does Azure Pipelines
> > have anything like CircleCI's "Debug with SSH" feature[1]?  (Where one
> > can click a "Rerun job with SSH", and it'll restart the pipeline but
> > also print out an ssh command someone can use to directly access the
> > box on which the test is running, in order to be able to investigate.)
> >  Failing that, assuming I can find a Windows system somewhere, is
> > there a list of steps for setting up a development environment and
> > building git on Windows?
>
> I'll just tell you why things go wrong here:
>
> In these cases, a byte that is intended to be an ISO8859-something
> characters is passed via the command line. This cannot work as intended
> on Windows, because the command line is not just a stream of bytes, but
> a string of characters. On Windows (and presumably also on macOS), the
> command line bytes are interpreted as UTF-8. As such, the bytes undergo
> some encoding conversions between UTF-8 and UTF-16LE. That cannot work
> when the bytes are not correct UTF-8 characters.
>
> To make the tests pass you have to pass the ISO8859-something characters
> via a file.

Thanks for the explanation. Yes, we cannot rely on command-lines (or for
that matter, environment variables) being opaque byte sequences, as that
does not work on Windows: byte sequences *always* have an encoding, and
are pretty much always converted into UTF-16 before continuing.

As to Debug with SSH: this is not possible in Azure Pipelines. What I
frequently do is to edit azure-pipelines.yml (usually restricting to one
particular job, e.g. Windows build, and to one particular test script) and
ci/ and t/ heavily, to get tons of debug information, then open a PR on
GitGitGadget to start a build.

That's how I investigated the macOS Mojave breakages, for example.

Ciao,
Dscho