Web lists-archives.com

[PATCH] convert: avoid malloc of original file size

From: Joey Hess <id@xxxxxxxxxx>

We write the output of a "clean" filter into a strbuf. Rather than
growing the strbuf dynamically as we read its output, we make the
initial allocation as large as the original input file. This is a good
guess when the filter is just tweaking a few bytes, but it's disastrous
when the point of the filter is to condense a very large file into a
short identifier (e.g., the way git-lfs and git-annex do). We may ask to
allocate many gigabytes, causing the allocation to fail and Git to

Instead, let's just let strbuf do its usual growth.

When the clean filter does output something around the same size as the
worktree file, the buffer will need to be reallocated until it fits,
starting at 8192 and doubling in size. Benchmarking indicates that
reallocation is not a significant overhead for outputs up to a
few MB in size.

Signed-off-by: Joey Hess <id@xxxxxxxxxx>
Signed-off-by: Jeff King <peff@xxxxxxxx>
This is a resurrection of the patch from:


It got stalled on discussion of the commit message, which I've rewritten
here to match the suggestions in the thread.

As discussed there, I do think this only solves half the problem, as the
smudge filter has the same issue in reverse. That's more complicated to
fix, and AFAIK nobody is working on it. But I don't think there's any
reason not to pick up this part in the meantime.

 convert.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/convert.c b/convert.c
index 5d0307fc10..94ff837649 100644
--- a/convert.c
+++ b/convert.c
@@ -731,7 +731,7 @@ static int apply_single_file_filter(const char *path, const char *src, size_t le
 	if (start_async(&async))
 		return 0;	/* error was already reported */
-	if (strbuf_read(&nbuf, async.out, len) < 0) {
+	if (strbuf_read(&nbuf, async.out, 0) < 0) {
 		err = error(_("read from external filter '%s' failed"), cmd);
 	if (close(async.out)) {