Re: git archive generates tar with malformed pax extended attribute

Am 25.05.19 um 23:07 schrieb Ævar Arnfjörð Bjarmason:
> On Sat, May 25 2019, René Scharfe wrote:
>> We could truncate symlink targets at the first NUL as well in git
>> archive -- but that would be a bit sad, as the archive formats allow
>> storing the "real" target from the repo, with NUL and all.

> But that being said, this assumption that data in a tar archive will get
> written to a FS of some sort isn't true. There's plenty of consumers of
> the format that read it in-memory and stream its contents out to
> something else entirely, e.g. taking "git archive --remote" output,
> parsing it with e.g. [1] and throwing some/all of the content into a
> database.
> 1. https://metacpan.org/pod/Archive::Tar

Git archive writes link targets that are 100 characters long or less
into the appropriate field in the plain tar header.  It copies
everything, including NULs, but unlike a PAX extended header that field
lacks a length indicator, so extractors basically have to treat it as

If we want to preserve NUL in short link targets as well, we'd have to
put such names into an PAX extended header..

 archive-tar.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/archive-tar.c b/archive-tar.c
index 3e53aac1e6..e8f55578d1 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -291,7 +291,8 @@ static int write_tar_entry(struct archiver_args *args,

 	if (S_ISLNK(mode)) {
-		if (size > sizeof(header.linkname)) {
+		if (size > sizeof(header.linkname) ||
+		    memchr(buffer, '\0', size)) {
 			xsnprintf(header.linkname, sizeof(header.linkname),
 				  "see %s.paxheader", oid_to_hex(oid));
 			strbuf_append_ext_header(&ext_header, "linkpath",