Re: [PATCH V2 1/2] Fix delta integer overflows
- Date: Fri, 11 Aug 2017 11:40:55 -0700
- From: Junio C Hamano <gitster@xxxxxxxxx>
- Subject: Re: [PATCH V2 1/2] Fix delta integer overflows
Martin Koegler <martin.koegler@xxxxxxxxx> writes:
> On Thu, Aug 10, 2017 at 01:07:07PM -0700, Junio C Hamano wrote:
>> > The current delta code produces incorrect pack objects for files > 4GB.
>> > Signed-off-by: Martin Koegler <martin.koegler@xxxxxxxxx>
>> I am a bit torn on this change.
>> Given that this is not merely a local storage format but it also is
>> an interchange format, we would probably want to make sure that the
>> receiving end (e.g. get_delta_hdr_size() that is used at the
>> beginning of patch_delta()) on a platform whose size_t is smaller
>> than that of a platform that produced the delta stream with this
>> code behaves "sensibly".
> Overflows would already be detected during unpack:
> * Assuming size_t = uint32, the system should just be able to handle up to 4GB of process memory.
> So loading any source blob larger than 4GB should already fail.
After re-reading patch_delta(), I agree. The loading of the base
object (i.e. the procedure to prepare src_buf & src_size fed to the
function) would have failed already.
> * Assuming size_t = uint32 and a source blob size < 4 GB, the
> target blob size would be readed truncated. apply_delta checks,
> that the generated result matches the encoded size - this check
> would fail.
When target size is truncated, we allocate much less than the real
target size, and start reading the input. But patch_delta() makes
sure that what is copied into the buffer, either by copying bytes
literally from the pack data or by copying from src_buf, will never
cause the resulting dst_buf overflow (especially after your change
updates "size" to size_t), so we should be safe on this side, too.
>> If we replaced ulong we use in create/patch delta codepaths with
>> uint32_t, that would be safer, just because the encoder would not be
>> able to emit varint that is larger than the receivers to handle.
>> But that defeats the whole point of using varint() to encode the
>> sizes in the first place. It was partly done for space saving, but
>> more for allowing larger sizes and larger ulong in the future
>> without having to change the file format.
> The ondisk-format is able to handle larger sizes [using a slightly worse compression].
> The current implementation is just buggy.
> I would not move to uint32_t. The remaing part of git uses "unsigned long", so the
> delta code could still be called with larger files.
Oh, absolutely. I was merely commenting on the lack of any error
checking in get_delta_hdr_size() helper function, and dismissing a
naive move to limiting the file format by insisting on uint32_t as
an unworkable workaround.