Web lists-archives.com

Re: t0028-working-tree-encoding.sh failing on musl based systems (Alpine Linux)




[Please skip using Reply-To and instead of Mail-Followup-To so that
responses also go to the list.]

On Thu, Feb 07, 2019 at 10:59:35PM +0100, Kevin Daudt wrote:
> I'm trying to get the git test suite passing on Alpine Linux, which is
> based on musl libc.
> 
> All tests in t0028-working-tree-encoding.sh are currently failing,
> because musl iconv does not support statefull output of UTF-16/32 (eg,
> it does not output a BOM), while git is expecting that to be present:
> 
> > hint: The file 'test.utf16' is missing a byte order mark (BOM). Please
> > use UTF-16BE or UTF-16LE (depending on the byte order) as
> > working-tree-encoding.
> > fatal: BOM is required in 'test.utf16' if encoded as utf-16
> 
> Because adding the file to get fails, all the other tests fail as well
> as they expect the file to be present in the repository.
> 
> Any idea how to get around this?

I think musl needs to patch their libc. RFC 2781 says that if there's no
BOM in UTF-16, then "the text SHOULD be interpreted as being
big-endian."

Unfortunately for all of us, many Windows-based programs have chosen to
ignore that advice (technically, it's only a SHOULD) and interpret it as
little-endian instead. Git can't safely assume anything about the
endianness of a UTF-16 stream that doesn't contain a BOM. Technically,
since the RFC doesn't specify a MUST requirement, musl can't, either.

Even if Git were to produce a BOM to work around this issue, then we'd
still have the problem that any program using musl will write data in
UTF-16 without a BOM. Moreover, because musl, in violation of the RFC,
doesn't read and process BOMs, someone using little-endian UTF-16 (with
a proper BOM) with musl and Git will have their data corrupted,
according to my reading of the musl website.

In other words, I believe this test is failing legitimately.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

Attachment: signature.asc
Description: PGP signature