Web lists-archives.com

Re: git-rebase is ignoring working-tree-encoding

El dom., 4 nov. 2018 a las 16:48, brian m. carlson
(<sandals@xxxxxxxxxxxxxxxxxxxx>) escribió:
> Do things work for you if you write this as "UTF-16LE"?  When you use
> working-tree-encoding, the file is stored internally as UTF-8, but it's
> serialized to the specified encoding when written out.

When I use UTF-16LE or UTF-16BE, then I can't commit or view diffs of
specified files, as Git prohibites BOM existance in these cases,
showing an error when attempting to commit. But BOM must also exist
for the project. I even experimented for fixing this issue within
Git's source. It turns out that Git is following an Unicode rule that
says that BOM is not permitted when declaring exact UTF-16BE/UTF-16LE
MIME (and UTF-32 variants) encoding types:


> Asking for "UTF-16" is ambiguous: there are two endiannesses, and so as
> long as you get a BOM in the output, either one is an acceptable option.
> Which one you get is dependent on what the underlying code thinks is the
> default, and traditionally for Unix systems and Unix tools that's been
> big-endian.  If you want a particular endianness, you should specify it.

I wrote a "counterpart" easy fix which instead only prohibites BOM for
the opposite endianness (for example if
working-tree-encoding=UTF-16LE, then finding an UTF-16BE BOM in the
file would cause Git to signal the error right before committing,
diffing, etc.). That way user would be encouraged to modify the file's
encoding to match the one specified in working-tree-encoding before
allowing these actions, therefore preventing Git from encoding to the
wrong endianness after file is written out. With few repository tests,
this new behaviour worked as expected. But then I realized this
solution would perhaps be unacceptable for Git's source code as it
would violate that Unicode standard. Anyways, here is a PR in my Git
fork with the changes I did, for reference:


Ah this point, the solution I came with recently for my project was
writing some code in Shell to fix the endianness of the re-encoded
files to UTF-16BE after the Git's write out process (or a "working
tree refresh" in my own words), within the same script that I use to
pack assets including the localization files.

> brian m. carlson: Houston, Texas, US
> OpenPGP: https://keybase.io/bk2204