Re: [PATCH v3 5/5] fast-export: do automatic reencoding of commit messages only if requested
- Date: Mon, 13 May 2019 09:41:06 -0700
- From: Elijah Newren <newren@xxxxxxxxx>
- Subject: Re: [PATCH v3 5/5] fast-export: do automatic reencoding of commit messages only if requested
On Mon, May 13, 2019 at 3:23 AM Johannes Schindelin
> Hi Elijah,
> On Sat, 11 May 2019, Elijah Newren wrote:
> > [...] the craziness is based on how Windows behaves; it seems insane to
> > me that Windows decides to munge user data (in the form of the command
> > line provided), so much so that it makes me wonder if I really
> > understood Hannes' and Dscho's explanations of what it is doing.
> It is not the user data that is munged by *Windows*, but by *Git for
> Windows*. The user data on Windows is encoded in UTF-16 (or some slight
> variant thereof). Git *cannot* handle UTF-16. Git's test suite *cannot*
> handle UTF-16. So we convert. That's all there is to it.
Ooh, it's Git for Windows doing the conversion? That means I can test
for the expected bytes with printf and grep, I only need to feed
special bytes to git via file instead of command line. That's better.
> P.S.: Of course it is not *all* there is to it. There is also a current
> code page which depends on the current user's current locale. We can
> definitely not rely on that, as Git has no idea about this and would quite
> positively produce incorrect output because of it. So we really just use
> the `*W()` functions of the Win32 API (i.e. the ones accepting wide
> Unicode characters and strings, i.e. UTF-16). I don't think we can do
> better than that.
Going off on a tangent for a minute...okay, so you need to do some
kind of conversion for Windows, but why is it automatically UTF-8 ->
UTF-16? In particular, if i18n.commitencoding configuration option is
set (to something other than UTF-8), then couldn't you instead convert
the commit message specified on the command line from $(git config
i18n.commitencoding) to UTF-16? Or, perhaps convert from $(git config
i18n.commitencoding) to UTF-8 before the automatic UTF-8 -> UTF-16
conversion? It doesn't matter for this series anymore since I've
worked around it (by passing the bytes via an external file as
suggested by Hannes), but it seems like Git For Windows might be able
to still do better here. Or maybe I'm still not understanding the
full picture. Anyway, thanks for all the explanations and the help
getting this fixed up.