Re: [PATCH v7 19/31] merge-recursive: add get_directory_renames()
- Date: Sat, 3 Feb 2018 18:04:39 -0800
- From: Elijah Newren <newren@xxxxxxxxx>
- Subject: Re: [PATCH v7 19/31] merge-recursive: add get_directory_renames()
On Sat, Feb 3, 2018 at 2:32 PM, Elijah Newren <newren@xxxxxxxxx> wrote:
> On Fri, Feb 2, 2018 at 5:02 PM, Stefan Beller <sbeller@xxxxxxxxxx> wrote:
>> On Tue, Jan 30, 2018 at 3:25 PM, Elijah Newren <newren@xxxxxxxxx> wrote:
>>> + while (*--end_of_new == *--end_of_old &&
>>> + end_of_old != old_path &&
>>> + end_of_new != new_path)
>>> + ; /* Do nothing; all in the while loop */
>> We have to compare manually as we'd want to find
>> the first non-equal and there doesn't seem to be a good
>> library function for that.
>> Assuming many repos are UTF8 (including in their paths),
>> how does this work with display characters longer than one char?
>> It should be fine as we cut at the slash?
> Oh, UTF-8. Ugh.
> Can UTF-8 characters, other than '/', have a byte whose value matches
> (unsigned char)('/')? If so, then I'll need to figure out how to do
> utf-8 character parsing. Anyone have pointers?
Well, after digging around for a while, I found this claim on the
Wikipedia page for UTF-8:
Since ASCII bytes do not occur when encoding non-ASCII code points
into UTF-8, UTF-8 is safe to use within most programming and document
languages that interpret certain ASCII characters in a special way,
such as "/" in filenames, "\" in escape sequences, and "%" in printf.
So, unless I'm reading something wrong here, I think that means this
code is just fine as it is.