Web lists-archives.com

Re: git-rebase is ignoring working-tree-encoding




On Wed, Nov 07, 2018 at 05:38:18AM +0100, Adrián Gimeno Balaguer wrote:
> Hello Torsten,
> 
> Thanks for answering.
> 
> Answering to your question, I removed the comments with "rebase" since
> my reported encoding issue happens on more simpler operations
> (described in the PR), and the problem is not directly related to
> rebasing, so I considered it better in order to avoid unrelated
> confusions.
> 
> Let's get back to the problem. Each system has a default endianness.
> Also, in .gitattributes's working-tree-encoding, Git behaves
> differently depending on the attribute's value and the contents of the
> referenced entry file. When I put the value "UTF-16", then the file
> must have a BOM, or Git complains. Otherwise, if I put the value
> "UTF-16BE" or "UTF-16LE", then Git prohibites operations if file has a
> BOM for that main encoding (UTF-16 here), which can be relate to any
> endianness.
> 
> My very initial goal was, given a UTF-16LE file, to be able to view
> human-readable diffs whenever I make a change on it (and yes, it must
> be Little Endian). Plus, this file had a BOM. Now, what are the
> options with Git currently (consider only working-tree-encoding)? If I
> put working-tree-encoding=UTF-16, then I could view readable diffs and
> commit the file, but here is the main problem: Git looses information
> about what initial endianness the file had, therefore, after
> staging/committing it re-encodes the file from UTF-8 (as stored
> internally) to UTF-16 and the default system endianness. In my case it
> did to Big Endian, thus affecting the project's requirement. That is
> why I ended up writing a fixup script to change the encoding back to
> UTF-16LE.

OK, I think I understand your problem now.
The file format which you ask for could be named "UTF-16-BOM-LE",
but that does not exist in reality.
If you use UTF-16, then there must be a BOM, and if there is a BOM,
then a Unicode-aware application -should- be able to handle it.

Why does your project require such a format ?

> 
> On the other hand, once I set working-tree-encoding=UTF-16LE, then Git
> prohibited me from committing the file and even viewing human-readable
> diffs (the output simply tells it's a binary file). In this sense, the
> internal location of these  errors is within the function of utf8.c I
> made changes to in the PR. I hope I was clearer!
> 
> Finally, Git behaviour around this is based on Unicode standards,
> which is why I acknowledged that my changes violated them after
> refering to a link which is present in the ut8.h file.

[]