Web lists-archives.com

Re: en/fast-export-encoding, was Re: What's cooking in git.git (May 2019, #01; Thu, 9)




Am 10.05.19 um 02:14 schrieb Elijah Newren:
> Hi Johannes,
> 
> On Thu, May 9, 2019 at 1:46 PM Johannes Schindelin
> <Johannes.Schindelin@xxxxxx> wrote:
>>
>> Hi Junio & Elijah,
>>
>> On Thu, 9 May 2019, Junio C Hamano wrote:
>>
>>> * en/fast-export-encoding (2019-05-07) 5 commits
>>>  - fast-export: do automatic reencoding of commit messages only if requested
>>>  - fast-export: differentiate between explicitly utf-8 and implicitly utf-8
>>>  - fast-export: avoid stripping encoding header if we cannot reencode
>>>  - fast-import: support 'encoding' commit header
>>>  - t9350: fix encoding test to actually test reencoding
>>>
>>>  The "git fast-export/import" pair has been taught to handle commits
>>>  with log messages in encoding other than UTF-8 better.
>>
>> This breaks on Windows, see
>> https://dev.azure.com/gitgitgadget/git/_build/results?buildId=8298&view=ms.vss-test-web.build-test-results-tab
>>
>> Sadly, I ran out of time looking at it in detail.
> 
> Thanks for the heads up, and for taking some time to check it out.
> The error doesn't seem obvious from the log.  Does Azure Pipelines
> have anything like CircleCI's "Debug with SSH" feature[1]?  (Where one
> can click a "Rerun job with SSH", and it'll restart the pipeline but
> also print out an ssh command someone can use to directly access the
> box on which the test is running, in order to be able to investigate.)
>  Failing that, assuming I can find a Windows system somewhere, is
> there a list of steps for setting up a development environment and
> building git on Windows?

I'll just tell you why things go wrong here:

In these cases, a byte that is intended to be an ISO8859-something
characters is passed via the command line. This cannot work as intended
on Windows, because the command line is not just a stream of bytes, but
a string of characters. On Windows (and presumably also on macOS), the
command line bytes are interpreted as UTF-8. As such, the bytes undergo
some encoding conversions between UTF-8 and UTF-16LE. That cannot work
when the bytes are not correct UTF-8 characters.

To make the tests pass you have to pass the ISO8859-something characters
via a file.

-- Hannes