Re: What's cooking in git.git (Jan 2019, #01; Mon, 7)
- Date: Wed, 9 Jan 2019 22:06:08 +0100
- From: Martin Ågren <martin.agren@xxxxxxxxx>
- Subject: Re: What's cooking in git.git (Jan 2019, #01; Mon, 7)
On Wed, 9 Jan 2019 at 08:37, Martin Ågren <martin.agren@xxxxxxxxx> wrote:
> On Tue, 8 Jan 2019 at 00:34, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> > * bc/sha-256 (2018-11-14) 12 commits
> > - hash: add an SHA-256 implementation using OpenSSL
> > - sha256: add an SHA-256 implementation using libgcrypt
> > - Add a base implementation of SHA-256 support
> > - commit-graph: convert to using the_hash_algo
> > - t/helper: add a test helper to compute hash speed
> > - sha1-file: add a constant for hash block size
> > - t: make the sha1 test-tool helper generic
> > - t: add basic tests for our SHA-1 implementation
> > - cache: make hashcmp and hasheq work with larger hashes
> > - hex: introduce functions to print arbitrary hashes
> > - sha1-file: provide functions to look up hash algorithms
> > - sha1-file: rename algorithm to "sha1"
> > Add sha-256 hash and plug it through the code to allow building Git
> > with the "NewHash".
> AddressSanitizer barks at current pu (855f98be272f19d16564e) for a
> handful of tests.
> One example is t5702-protocol-v2.sh. [...]
> ==1691823==ERROR: AddressSanitizer: heap-buffer-overflow on address
> 0x6040000004f2 at pc 0x0000004ea0fd bp 0x7ffc53082590 sp
> READ of size 32 at 0x6040000004f2 thread T0
> #0 0x4ea0fc in __asan_memcpy
> #1 0x8603ec in oidset_insert oidset.c
> #2 0x86c977 in add_promisor_object packfile.c:2129:4
> #3 0x86c07a in for_each_object_in_pack packfile.c:2070:7
> #4 0x86c535 in for_each_packed_object packfile.c:2095:7
> #5 0x86c651 in is_promisor_object packfile.c:2151:4
> 0x6040000004f2 is located 0 bytes to the right of 34-byte region
> allocated by thread T0 here:
> #0 0x4eb4cf in malloc
> #1 0x9fa1db in do_xmalloc wrapper.c:60:8
> #2 0x9fa2fd in do_xmallocz wrapper.c:100:8
> #3 0x9fa2fd in xmallocz_gently wrapper.c:113
> #4 0x86a877 in unpack_compressed_entry packfile.c:1588:11
> #5 0x86a02e in unpack_entry packfile.c:1737:11
> #6 0x867431 in cache_or_unpack_entry packfile.c:1439:10
> #7 0x867431 in packed_object_info packfile.c:1506
> #8 0x96b7be in oid_object_info_extended sha1-file.c:1394:10
> #9 0x96d7d0 in read_object sha1-file.c:1434:6
> #10 0x96d7d0 in read_object_file_extended sha1-file.c:1476
> #11 0x85cf40 in repo_read_object_file ./object-store.h:174:9
> #12 0x85cf40 in parse_object object.c:273
> #13 0x86c752 in add_promisor_object packfile.c:2108:23
> #14 0x86c07a in for_each_object_in_pack packfile.c:2070:7
> #15 0x86c535 in for_each_packed_object packfile.c:2095:7
> #16 0x86c651 in is_promisor_object packfile.c:2151:4
I found some more time to look into this.
It seems we have a buffer with raw data and we set up a `struct
object_id *` pointing into it, at a (supposed) OID value. Then
`update_tree_entry_internal()` verifies that the buffer contains
sufficiently many bytes, i.e., at least `the_hash_algo->rawsz` (=20).
We immediately call `oidset_insert()` which copies an entire struct,
i.e., we copy sizeof(struct object_id) (=32) bytes. Which is 12 more
than what is known to be safe. For this particular input data, we read
outside allocated memory.
I can think of three possible approaches:
* Allocate with a margin (GIT_MAX_RAWSZ - the_hash_algo->rawsz) where
"necessary" (TM). Maybe not so maintainable.
* Teach `oidset_insert()` (i.e., khash) to only copy
`the_hash_algo->rawsz` bytes. Maybe not so good for performance.
I wonder which of these is the least awful, or if there are other ideas.