Web lists-archives.com

Re: Linux-kernel examples for LKMM recipes




On Thu, Oct 12, 2017 at 09:23:59AM +0800, Boqun Feng wrote:
> On Wed, Oct 11, 2017 at 10:32:30PM +0000, Paul E. McKenney wrote:
> > 	I am not aware of any three-CPU release-acquire chains in the
> > 	Linux kernel.  There are three-CPU lock-based chains in RCU,
> > 	but these are not at all simple, either.
> > 
> 
> The "Program-Order guarantees" case in scheduler? See the comments
> written by Peter above try_to_wake_up():
> 
>  * The basic program-order guarantee on SMP systems is that when a task [t]
>  * migrates, all its activity on its old CPU [c0] happens-before any subsequent
>  * execution on its new CPU [c1].
> ...
>  * For blocking we (obviously) need to provide the same guarantee as for
>  * migration. However the means are completely different as there is no lock
>  * chain to provide order. Instead we do:
>  *
>  *   1) smp_store_release(X->on_cpu, 0)
>  *   2) smp_cond_load_acquire(!X->on_cpu)
>  *
>  * Example:
>  *
>  *   CPU0 (schedule)  CPU1 (try_to_wake_up) CPU2 (schedule)
>  *
>  *   LOCK rq(0)->lock LOCK X->pi_lock
>  *   dequeue X
>  *   sched-out X
>  *   smp_store_release(X->on_cpu, 0);
>  *
>  *                    smp_cond_load_acquire(&X->on_cpu, !VAL);
>  *                    X->state = WAKING
>  *                    set_task_cpu(X,2)
>  *
>  *                    LOCK rq(2)->lock
>  *                    enqueue X
>  *                    X->state = RUNNING
>  *                    UNLOCK rq(2)->lock
>  *
>  *                                          LOCK rq(2)->lock // orders against CPU1
>  *                                          sched-out Z
>  *                                          sched-in X
>  *                                          UNLOCK rq(2)->lock
>  *
>  *                    UNLOCK X->pi_lock
>  *   UNLOCK rq(0)->lock
> 
> This is a chain mixed with lock and acquire-release(maybe even better?).
> 
> 
> And another example would be osq_{lock,unlock}() on multiple(more than
> three) CPUs. 

I think the qrwlock also has something similar with the writer fairness
issue fixed:

CPU0: (writer doing an unlock)
smp_store_release(&lock->wlocked, 0);	// Bottom byte of lock->cnts


CPU1: (waiting writer on slowpath)
atomic_cond_read_acquire(&lock->cnts, VAL == _QW_WAITING);
...
arch_spin_unlock(&lock->wait_lock);


CPU2: (reader on slowpath)
arch_spin_lock(&lock->wait_lock);

and there's mixed-size accesses here too. Fun stuff!

Will