Web lists-archives.com

Re: [PATCH/RFC v1 1/1] Use size_t instead of unsigned long

On Sun, Nov 18, 2018 at 03:18:52PM -0500, Derrick Stolee wrote:
> On 11/17/2018 10:11 AM, tboegi@xxxxxx wrote:
> >From: Torsten Bögershausen <tboegi@xxxxxx>
> >
> >Currently Git users can not commit files >4Gib under 64 bit Windows,
> >where "long" is 32 bit but size_t is 64 bit.
> >Improve the code base in small steps, as small as possible.
> >What started with a small patch to replace "unsigned long" with size_t
> >in one file (convert.c) ended up with a change in many files.
> >
> >Signed-off-by: Torsten Bögershausen <tboegi@xxxxxx>
> >---
> >
> >This needs to go on top of pu, to cover all the good stuff
> >   cooking here.
> Better to work on top of 'master', as the work in 'pu' will be rewritten
> several times, probably.
> >I have started this series on November 1st, since that 2 or 3 rebases
> >   had been done to catch up, and now it is on pu from November 15.
> >
> >I couldn't find a reason why changing "unsigned ling"
> >   into "size_t" may break anything, any thoughts, please ?
> IIRC, the blocker for why we haven't done this already is that "size_t",
> "timestamp_t" and "off_t" are all 64-bit types that give more implied
> meaning, so we should swap types specifically to these as we want. One
> example series does a specific, small change [1].
> If we wanted to do a single swap that removes 'unsigned long' with an
> unambiguously 64-bit type, I would recommend "uint64_t". Later replacements
> to size_t, off_t, and timestamp_t could happen on a case-by-case basis for
> type clarity.
> It may benefit reviewers to split this change into multiple patches,
> starting at the lowest levels of the call stack (so higher 'unsigned long's
> can up-cast to the 64-bit types when calling a function) and focusing the
> changes to one or two files at a time (X.c and X.h, preferrably).
> Since you are talking about the benefits for Git for Windows to accept 4GB
> files, I wonder if we can add a test that verifies such a file will work. If
> you have such a test, then I could help verify that the test fails before
> the change and succeeds afterward.
> Finally, it may be good to add a coccinelle script that replaces 'unsigned
> long' with 'uint64_t' so we can easily fix any new introductions that happen
> in the future.

The plan was never to change "unsigned long" to a 64 bit value in general.
The usage of "unsigned long" (instead of size_t) was (and is) still good
for 32bit systems, where both are 32 bit. (at least all system I am aware of).

For 64 bit systems like Linux or Mac OS it is the same, both are 64 bit.

The only problematic system is Win64, where "unsigned long" is 32 bit,
and therefore we must use size_t to address data in memory.
This is not to be confused with off_t, which is used for "data on disk"
(and nothing else) or timestamp_t which is used for timestamps (and nothing else).

I haven't followed the "coccinelle script" development at all, if someone
makes a patch do replace "unsigned long" with size_t, that could replace
my whole patch. (Some of them may be downgraded to "unsigned int" ?)

However, as we need to let  tb/print-size-t-with-uintmax-format make it to
master, otherwise we are not able to print the variables in a portable way. 

> Thanks! I do think we should make this change, but we must be careful. It
> may be disruptive to topics in flight.
> -Stolee
> [1] https://public-inbox.org/git/20181112084031.11769-1-carenas@xxxxxxxxx/