Web lists-archives.com

Re: [PATCH 5/5] run-command: add note about forking and threading




Hi,

Brandon Williams wrote:

> --- a/run-command.c
> +++ b/run-command.c
> @@ -458,6 +458,14 @@ int start_command(struct child_process *cmd)
>  		argv_array_pushv(&argv, cmd->argv);
>  	}
>  
> +	/*
> +	 * NOTE: In order to prevent deadlocking when using threads special
> +	 * care should be taken with the function calls made in between the
> +	 * fork() and exec() calls.  No calls should be made to functions which
> +	 * require acquiring a lock (e.g. malloc) as the lock could have been
> +	 * held by another thread at the time of forking, causing the lock to
> +	 * never be released in the child process.
> +	 */
>  	cmd->pid = fork();

Why can't git use e.g. posix_spawn to avoid this?

fork()-ing in a threaded context is very painful for maintainability.
Any library function you are using could start taking a lock, and then
you have a deadlock.  So you have to make use of a very small
whitelisted list of library functions for this to work.

The function calls you have to audit are not only between fork() and
exec() in the normal control flow.  You have to worry about signal
handlers, too.

By comparison, posix_spawn() takes care of this all.  I really
strongly do not want to move in the direction of more fork() in a
multi-threaded context.  I'd rather turn off threads in the submodule
case of grep altogether, but that shouldn't be needed:

* it is possible to launch a command without fork+exec
* if that's not suitable, then using fork introduces parallelism that
  interacts much better with fork+exec than threads do

I really don't want to go down this path.  I've said that a few times.
I'm saying it again now.

My two cents,
Jonathan