Re: t0028-working-tree-encoding.sh failing on musl based systems (Alpine Linux)
- Date: Sat, 9 Feb 2019 09:09:40 +0100
- From: Torsten Bögershausen <tboegi@xxxxxx>
- Subject: Re: t0028-working-tree-encoding.sh failing on musl based systems (Alpine Linux)
On 08.02.19 07:04, Rich Felker wrote:
> On Fri, Feb 08, 2019 at 12:17:05AM +0000, brian m. carlson wrote:
>> Even if Git were to produce a BOM to work around this issue, then we'd
>> still have the problem that any program using musl will write data in
>> UTF-16 without a BOM. Moreover, because musl, in violation of the RFC,
>> doesn't read and process BOMs, someone using little-endian UTF-16 (with
>> a proper BOM) with musl and Git will have their data corrupted,
>> according to my reading of the musl website.
> That information is outdated and someone from our side should update
> it; since 1.1.19, musl treats "UTF-16" input as ambiguous endianness
> determined by BOM, defaulting to big if there's no BOM. However output
> is always big endian, such that processes conforming to the Unicode
> SHOULD clause will interpret it correctly.
> The portable way to get little endian with a BOM is to open a
> conversion descriptor for "UTF-16LE" (which should not add any BOM)
> and write a BOM manually.
That is possible in the next upcoming version of Git:
Merge: cfd9167c15 aab2a1ae48
Author: Junio C Hamano <gitster@xxxxxxxxx>
Date: Wed Feb 6 22:05:21 2019 -0800
Merge branch 'tb/utf-16-le-with-explicit-bom'
A new encoding UTF-16LE-BOM has been invented to force encoding to
UTF-16 with BOM in little endian byte order, which cannot be directly
generated by using iconv.
Support working-tree-encoding "UTF-16LE-BOM"