Web lists-archives.com

Add a new fchmodat4() syscall




I spent half of dinner last night being complained to by one of our
hardware engineers about Linux's lack of support for the flags argument
to fchmodat().  This all came about because of a FUSE filesystem
implementation, and while there are some application-specific
workarounds for the issue it seemed to me like the cleanest bet was to
just go add another fchmodat() that supports flags to the kernel.

The actual implementation is super simple: essentially it's just
the same as fchmodat(), but LOOKUP_FOLLOW is conditionally set based on
the flags.  I've attempted to make this match "man 2 fchmodat" as
closely as possible, which says EINVAL is returned for invalid flags (as
opposed to ENOTSUPP, which is currently returned by glibc for
AT_SYMLINK_NOFOLLOW).  I have a sketch of a glibc patch that I haven't
even compiled yet, but seems fairly straight-forward:

    diff --git a/sysdeps/unix/sysv/linux/fchmodat.c b/sysdeps/unix/sysv/linux/fchmodat.c
    index 6d9cbc1ce9e0..b1beab76d56c 100644
    --- a/sysdeps/unix/sysv/linux/fchmodat.c
    +++ b/sysdeps/unix/sysv/linux/fchmodat.c
    @@ -29,12 +29,36 @@
     int
     fchmodat (int fd, const char *file, mode_t mode, int flag)
     {
    -  if (flag & ~AT_SYMLINK_NOFOLLOW)
    -    return INLINE_SYSCALL_ERROR_RETURN_VALUE (EINVAL);
    -#ifndef __NR_lchmod		/* Linux so far has no lchmod syscall.  */
    +  /* There are four paths through this code:
    +      - The flags are zero.  In this case it's fine to call fchmodat.
    +      - The flags are non-zero and glibc doesn't have access to
    +	__NR_fchmodat4.  In this case all we can do is emulate the error codes
    +	defined by the glibc interface from userspace.
    +      - The flags are non-zero, glibc has __NR_fchmodat4, and the kernel has
    +	fchmodat4.  This is the simplest case, as the fchmodat4 syscall exactly
    +	matches glibc's library interface so it can be called directly.
    +      - The flags are non-zero, glibc has __NR_fchmodat4, but the kernel does
    +	not.  In this case we must respect the error codes defined by the glibc
    +	interface instead of returning ENOSYS.
    +    The intent here is to ensure that the kernel is called at most once per
    +    library call, and that the error types defined by glibc are always
    +    respected.  */
    +
    +#ifdef __NR_fchmodat4
    +  long result;
    +#endif
    +
    +  if (flag == 0)
    +    return INLINE_SYSCALL (fchmodat, 3, fd, file, mode);
    +
    +#ifdef __NR_fchmodat4
    +  result = INLINE_SYSCALL (fchmodat4, 4, fd, file, mode, flag);
    +  if (result == 0 || errno != ENOSYS)
    +    return result;
    +#endif
    +
       if (flag & AT_SYMLINK_NOFOLLOW)
         return INLINE_SYSCALL_ERROR_RETURN_VALUE (ENOTSUP);
    -#endif
     
    -  return INLINE_SYSCALL (fchmodat, 3, fd, file, mode);
    +  return INLINE_SYSCALL_ERROR_RETURN_VALUE (EINVAL);
     }

I've never added a new syscall before so I'm not really sure what the
proper procedure to follow is.  I'm assuming any new syscall will
involve fairly significant discussion, so I've just done the minimum of
an implementation for this patch set.  Specifically, I've:

* Defined a new syscall that looks like fchmodat but includes a flag
  argument, which I'm calling fchmodat4 because it has 4 arguments.  I
  don't know if that's the correct naming convention, and don't really
  have any skin in that game.
* Implemented that syscall by extending the fchmod code to handle flags,
  which is pretty straight-forward.  I think it's sane, but given that
  it's so simple I'm not sure if I'm missing something -- specifically,
  I didn't go check to make sure the semantics of AT_SYMLINK_NOFOLLOW
  match !LOOKUP_FOLLOW.  I'm assuming the do, but sometimes when I look
  at something and say "that's so simple, how is it broken" I'm actually
  just missing something entirely.
* Added an asm-generic syscall number for this, which I assume I'm
  supposed to do this first as it looks like we're trying to keep the
  numbers in sync everywhere.
* Added x86 syscalls for this so I could test it.

I also cleaned up a checkpatch issue in fchmodat().  I only found this
because I copied the fchmodat() interface for fchmodat4() and it threw
the warning, I don't personally care either way as to whether or not the
space is in there.

I've given this fairly minimal testing.  Essentially all I've done is
booted up 5.1.6 with this patch set on my local development box and run

    $ touch test-file
    $ ln -s test-file test-link
    $ cat > test.c
    #include <fcntl.h>
    #include <stdio.h>
    #include <unistd.h>
    
    int main(int argc, char **argv)
    {
            long out;
    
            out = syscall(428, AT_FDCWD, "test-file", 0x888, AT_SYMLINK_NOFOLLOW);
            printf("fchmodat4(AT_FDCWD, \"test-file\", 0x888, AT_SYMLINK_NOFOLLOW): %ld\n", out);
    
            out = syscall(428, AT_FDCWD, "test-file", 0x888, 0);
            printf("fchmodat4(AT_FDCWD, \"test-file\", 0x888, 0): %ld\n", out);
    
            out = syscall(268, AT_FDCWD, "test-file", 0x888);
            printf("fchmodat(AT_FDCWD, \"test-file\", 0x888): %ld\n", out);
    
            out = syscall(428, AT_FDCWD, "test-link", 0x888, AT_SYMLINK_NOFOLLOW);
            printf("fchmodat4(AT_FDCWD, \"test-link\", 0x888, AT_SYMLINK_NOFOLLOW): %ld\n", out);
    
            out = syscall(428, AT_FDCWD, "test-link", 0x888, 0);
            printf("fchmodat4(AT_FDCWD, \"test-link\", 0x888, 0): %ld\n", out);
    
            out = syscall(268, AT_FDCWD, "test-link", 0x888);
            printf("fchmodat(AT_FDCWD, \"test-link\", 0x888): %ld\n", out);
    
            return 0;
    }
    $ gcc test.c -o test
    $ ./test
    fchmodat4(AT_FDCWD, "test-file", 0x888, AT_SYMLINK_NOFOLLOW): 0
    fchmodat4(AT_FDCWD, "test-file", 0x888, 0): 0
    fchmodat(AT_FDCWD, "test-file", 0x888): 0
    fchmodat4(AT_FDCWD, "test-link", 0x888, AT_SYMLINK_NOFOLLOW): -1
    fchmodat4(AT_FDCWD, "test-link", 0x888, 0): 0
    fchmodat(AT_FDCWD, "test-link", 0x888): 0

While I don't think there's any reason what's there is unacceptable, I
don't really consider this finished.  I couldn't find a cookbook for
"here's how you add a system call", but all I really did was "git grep
add | grep syscall" so if there's something out there then please let me
know and I'll follow it.  Specifically, I haven't:

* Added any sort of documentation.  I don't find anything with a "git
  grep fchmodat", so I'm assuming it's just the man pages that are
  relevant here.
* Fixed any of the other architectures.  I'm assuming this is just the
  mechanical process of fixing all these in the same way I did for x86.

      arch/alpha/kernel/syscalls/syscall.tbl:461      common  fchmodat                        sys_fchmodat
      arch/arm/tools/syscall.tbl:333  common  fchmodat                sys_fchmodat
      arch/arm64/include/asm/unistd32.h:#define __NR_fchmodat 333
      arch/arm64/include/asm/unistd32.h:__SYSCALL(__NR_fchmodat, sys_fchmodat)
      arch/ia64/kernel/fsys.S:        data8 0                         // fchmodat
      arch/ia64/kernel/syscalls/syscall.tbl:268       common  fchmodat                        sys_fchmodat
      arch/m68k/kernel/syscalls/syscall.tbl:299       common  fchmodat                        sys_fchmodat
      arch/microblaze/kernel/syscalls/syscall.tbl:306 common  fchmodat                        sys_fchmodat
      arch/mips/kernel/syscalls/syscall_n32.tbl:262   n32     fchmodat                        sys_fchmodat
      arch/mips/kernel/syscalls/syscall_n64.tbl:258   n64     fchmodat                        sys_fchmodat
      arch/mips/kernel/syscalls/syscall_o32.tbl:299   o32     fchmodat                        sys_fchmodat
      arch/parisc/kernel/syscalls/syscall.tbl:286     common  fchmodat                sys_fchmodat
      arch/powerpc/kernel/syscalls/syscall.tbl:297    common  fchmodat                        sys_fchmodat
      arch/s390/kernel/syscalls/syscall.tbl:299  common       fchmodat                sys_fchmodat                    sys_fchmodat
      arch/sh/include/uapi/asm/unistd_64.h:#define __NR_fchmodat              334
      arch/sh/kernel/syscalls/syscall.tbl:306 common  fchmodat                        sys_fchmodat
      arch/sh/kernel/syscalls_64.S:   .long sys_fchmodat
      arch/sparc/kernel/syscalls/syscall.tbl:295      common  fchmodat                sys_fchmodat
      arch/xtensa/kernel/syscalls/syscall.tbl:300     common  fchmodat                        sys_fchmodat
* Looked at anything in tools.  Again, I'm assuming it's just a
  mechanical process of looking at all of these and adding fchmodat4.

      tools/include/nolibc/nolibc.h:#ifdef __NR_fchmodat
      tools/include/nolibc/nolibc.h:  return my_syscall4(__NR_fchmodat, AT_FDCWD, path, mode, 0);
      tools/include/uapi/asm-generic/unistd.h:#define __NR_fchmodat 53
      tools/include/uapi/asm-generic/unistd.h:__SYSCALL(__NR_fchmodat, sys_fchmodat)
      tools/perf/arch/powerpc/entry/syscalls/syscall.tbl:297  common  fchmodat                        sys_fchmodat
      tools/perf/arch/s390/entry/syscalls/syscall.tbl:299  common     fchmodat                sys_fchmodat                    compat_sys_fchmodat
      tools/perf/arch/x86/entry/syscalls/syscall_64.tbl:268   common  fchmodat                __x64_sys_fchmodat
      tools/perf/builtin-trace.c:     { .name     = "fchmodat",

* Done anything with userspace, aside from thinking about the glibc code
  above.  I'd assume that I'm meant to bring in libc-alpha to the
  discussion, but I didn't want to do so this early in case this was
  just a non-starter.

I'm happy dealing with all of that, but given that I'm assuming there's
going to be some discussion I wanted to send out the proof-of-concept
first to see if this has any legs.  Aside from the glibc side the
remaining work smells pretty mechanical, so I figured I'd wait on that
until I knew it wasn't going to be a waste of time -- partially because
I'm lazy, but mostly because I just realized I blew my whole morning
working on this when all I really wanted to do was avoid discussing
fchmodat in the first place :)