Re: en/fast-export-encoding, was Re: What's cooking in git.git (May 2019, #01; Thu, 9)
- Date: Fri, 10 May 2019 08:21:08 +0200
- From: Johannes Sixt <j6t@xxxxxxxx>
- Subject: Re: en/fast-export-encoding, was Re: What's cooking in git.git (May 2019, #01; Thu, 9)
Am 10.05.19 um 02:14 schrieb Elijah Newren:
> Hi Johannes,
> On Thu, May 9, 2019 at 1:46 PM Johannes Schindelin
> <Johannes.Schindelin@xxxxxx> wrote:
>> Hi Junio & Elijah,
>> On Thu, 9 May 2019, Junio C Hamano wrote:
>>> * en/fast-export-encoding (2019-05-07) 5 commits
>>> - fast-export: do automatic reencoding of commit messages only if requested
>>> - fast-export: differentiate between explicitly utf-8 and implicitly utf-8
>>> - fast-export: avoid stripping encoding header if we cannot reencode
>>> - fast-import: support 'encoding' commit header
>>> - t9350: fix encoding test to actually test reencoding
>>> The "git fast-export/import" pair has been taught to handle commits
>>> with log messages in encoding other than UTF-8 better.
>> This breaks on Windows, see
>> Sadly, I ran out of time looking at it in detail.
> Thanks for the heads up, and for taking some time to check it out.
> The error doesn't seem obvious from the log. Does Azure Pipelines
> have anything like CircleCI's "Debug with SSH" feature? (Where one
> can click a "Rerun job with SSH", and it'll restart the pipeline but
> also print out an ssh command someone can use to directly access the
> box on which the test is running, in order to be able to investigate.)
> Failing that, assuming I can find a Windows system somewhere, is
> there a list of steps for setting up a development environment and
> building git on Windows?
I'll just tell you why things go wrong here:
In these cases, a byte that is intended to be an ISO8859-something
characters is passed via the command line. This cannot work as intended
on Windows, because the command line is not just a stream of bytes, but
a string of characters. On Windows (and presumably also on macOS), the
command line bytes are interpreted as UTF-8. As such, the bytes undergo
some encoding conversions between UTF-8 and UTF-16LE. That cannot work
when the bytes are not correct UTF-8 characters.
To make the tests pass you have to pass the ISO8859-something characters
via a file.