Web lists-archives.com

Re: [PATCH] tty: make n_tty_read() always abort if hangup is in progress




On Tue, Feb 13, 2018 at 07:38:08AM -0800, Tejun Heo wrote:
> A tty is hung up by __tty_hangup() setting file->f_op to
> hung_up_tty_fops, which is skipped on ttys whose write operation isn't
> tty_write().  This means that, for example, /dev/console whose write
> op is redirected_tty_write() is never actually marked hung up.
> 
> Because n_tty_read() uses the hung up status to decide whether to
> abort the waiting readers, the lack of hung-up marking can lead to the
> following scenario.
> 
>  1. A session contains two processes.  The leader and its child.  The
>     child ignores SIGHUP.
> 
>  2. The leader exits and starts disassociating from the controlling
>     terminal (/dev/console).
> 
>  3. __tty_hangup() skips setting f_op to hung_up_tty_fops.
> 
>  4. SIGHUP is delivered and ignored.
> 
>  5. tty_ldisc_hangup() is invoked.  It wakes up the waits which should
>     clear the read lockers of tty->ldisc_sem.
> 
>  6. The reader wakes up but because tty_hung_up_p() is false, it
>     doesn't abort and goes back to sleep while read-holding
>     tty->ldisc_sem.
> 
>  7. The leader progresses to tty_ldisc_lock() in tty_ldisc_hangup()
>     and is now stuck in D sleep indefinitely waiting for
>     tty->ldisc_sem.
> 
> The following is Alan's explanation on why some ttys aren't hung up.
> 
>  http://lkml.kernel.org/r/20171101170908.6ad08580@alans-desktop
> 
>  1. It broke the serial consoles because they would hang up and close
>     down the hardware. With tty_port that *should* be fixable properly
>     for any cases remaining.
> 
>  2. The console layer was (and still is) completely broken and doens't
>     refcount properly. So if you turn on console hangups it breaks (as
>     indeed does freeing consoles and half a dozen other things).
> 
> As neither can be fixed quickly, this patch works around the problem
> by introducing a new flag, TTY_HUPPING, which is used solely to tell
> n_tty_read() that hang-up is in progress for the console and the
> readers should be aborted regardless of the hung-up status of the
> device.
> 
> 
> The following is a sample hung task warning caused by this issue.
> 
>   INFO: task agetty:2662 blocked for more than 120 seconds.
>         Not tainted 4.11.3-dbg-tty-lockup-02478-gfd6c7ee-dirty #28
>   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>       0  2662      1 0x00000086
>   Call Trace:
>    __schedule+0x267/0x890
>    schedule+0x36/0x80
>    schedule_timeout+0x23c/0x2e0
>    ldsem_down_write+0xce/0x1f6
>    tty_ldisc_lock+0x16/0x30
>    tty_ldisc_hangup+0xb3/0x1b0
>    __tty_hangup+0x300/0x410
>    disassociate_ctty+0x6c/0x290
>    do_exit+0x7ef/0xb00
>    do_group_exit+0x3f/0xa0
>    get_signal+0x1b3/0x5d0
>    do_signal+0x28/0x660
>    exit_to_usermode_loop+0x46/0x86
>    do_syscall_64+0x9c/0xb0
>    entry_SYSCALL64_slow_path+0x25/0x25
> 
> The following is the repro.  Run "$PROG /dev/console".  The parent
> process hangs in D state.
> 
>   #include <sys/types.h>
>   #include <sys/stat.h>
>   #include <sys/wait.h>
>   #include <sys/ioctl.h>
>   #include <fcntl.h>
>   #include <unistd.h>
>   #include <stdio.h>
>   #include <stdlib.h>
>   #include <errno.h>
>   #include <signal.h>
>   #include <time.h>
>   #include <termios.h>
> 
>   int main(int argc, char **argv)
>   {
> 	  struct sigaction sact = { .sa_handler = SIG_IGN };
> 	  struct timespec ts1s = { .tv_sec = 1 };
> 	  pid_t pid;
> 	  int fd;
> 
> 	  if (argc < 2) {
> 		  fprintf(stderr, "test-hung-tty /dev/$TTY\n");
> 		  return 1;
> 	  }
> 
> 	  /* fork a child to ensure that it isn't already the session leader */
> 	  pid = fork();
> 	  if (pid < 0) {
> 		  perror("fork");
> 		  return 1;
> 	  }
> 
> 	  if (pid > 0) {
> 		  /* top parent, wait for everyone */
> 		  while (waitpid(-1, NULL, 0) >= 0)
> 			  ;
> 		  if (errno != ECHILD)
> 			  perror("waitpid");
> 		  return 0;
> 	  }
> 
> 	  /* new session, start a new session and set the controlling tty */
> 	  if (setsid() < 0) {
> 		  perror("setsid");
> 		  return 1;
> 	  }
> 
> 	  fd = open(argv[1], O_RDWR);
> 	  if (fd < 0) {
> 		  perror("open");
> 		  return 1;
> 	  }
> 
> 	  if (ioctl(fd, TIOCSCTTY, 1) < 0) {
> 		  perror("ioctl");
> 		  return 1;
> 	  }
> 
> 	  /* fork a child, sleep a bit and exit */
> 	  pid = fork();
> 	  if (pid < 0) {
> 		  perror("fork");
> 		  return 1;
> 	  }
> 
> 	  if (pid > 0) {
> 		  nanosleep(&ts1s, NULL);
> 		  printf("Session leader exiting\n");
> 		  exit(0);
> 	  }
> 
> 	  /*
> 	   * The child ignores SIGHUP and keeps reading from the controlling
> 	   * tty.  Because SIGHUP is ignored, the child doesn't get killed on
> 	   * parent exit and the bug in n_tty makes the read(2) block the
> 	   * parent's control terminal hangup attempt.  The parent ends up in
> 	   * D sleep until the child is explicitly killed.
> 	   */
> 	  sigaction(SIGHUP, &sact, NULL);
> 	  printf("Child reading tty\n");
> 	  while (1) {
> 		  char buf[1024];
> 
> 		  if (read(fd, buf, sizeof(buf)) < 0) {
> 			  perror("read");
> 			  return 1;
> 		  }
> 	  }
> 
> 	  return 0;
>   }
> 
> 
> Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>
> Cc: Alan Cox <alan@llwyncelyn.cymru>
> Cc: stable@xxxxxxxxxxxxxxx
> ---
> Hello,
> 
> This patch was initially posted several months ago as a sort of RFC.
> 
>  http://lkml.kernel.org/r/20171101020651.GK3252168@xxxxxxxxxxxxxxxxxxxxxxxxxxx
> 
> IIUC Alan's explanation correctly, this workaround actually seems like
> the right thing to do, so I refreshed the patch, added some comments
> and updated the patch description accordingly.
> 
> We've been running this in the fleet for the past couple months to
> avoid the process hang issues and it hasn't caused any issues.  Greg,
> can you please pick it up?

Yes, I will, give me a few days to catch up on my patch queue.

thanks,

greg k-h