Web lists-archives.com

Re: t0028-working-tree-encoding.sh failing on musl based systems (Alpine Linux)

On Fri, Feb 08, 2019 at 01:04:03AM -0500, Rich Felker wrote:
> That information is outdated and someone from our side should update
> it; since 1.1.19, musl treats "UTF-16" input as ambiguous endianness
> determined by BOM, defaulting to big if there's no BOM. However output
> is always big endian, such that processes conforming to the Unicode
> SHOULD clause will interpret it correctly.

It's good to hear that musl now supports parsing UTF-16 BOMs.

> The portable way to get little endian with a BOM is to open a
> conversion descriptor for "UTF-16LE" (which should not add any BOM)
> and write a BOM manually.

Right, I think my point is that we have existing systems which we know
ignore the SHOULD and assume something different. Perhaps in retrospect,
it would have been better to use MUST to specify areas where
interoperability is a concern.

> In any case, this test seems mainly relevant to Windows users wanting
> to store source files in UTF-16LE with BOM. This doesn't really make
> sense to do on a Linux/musl system, so I'm not sure any action is
> needed here from either side.

I do know that some people use CIFS or the like to share repositories
between Unix and Windows. However, I agree that such people aren't
likely to use UTF-16 on Unix systems. The working tree encoding
functionality also supports other encodings which musl may or may not

If you and your users are comfortable with the fact that the test (and
the corresponding functionality) won't work as expected with UTF-16,
then I agree that no action is needed.
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

Attachment: signature.asc
Description: PGP signature