Web lists-archives.com

Re: Linux-kernel examples for LKMM recipes




Hi Paul,

On Wed, Oct 11, 2017 at 03:32:30PM -0700, Paul E. McKenney wrote:
> Hello!
> 
> At Linux Plumbers Conference, we got requests for a recipes document,
> and a further request to point to actual code in the Linux kernel.
> I have pulled together some examples for various litmus-test families,
> as shown below.  The decoder ring for the abbreviations (ISA2, LB, SB,
> MP, ...) is here:
> 
> 	https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test6.pdf
> 
> This document is also checked into the memory-models git archive:
> 
> 	https://github.com/aparri/memory-model.git
> 
> I would be especially interested in simpler examples in general, and
> of course any example at all for the cases where I was unable to find
> any.  Thoughts?

Below are some examples we did discuss (at some point):

The comment in kernel/events/ring_buffer.c:perf_output_put_handle()
describes instances of MP+wmb+rmb and LB+ctrl+mb.

The comments in kernel/sched/core.c:try_to_wake_up() describes more
instances of MP ("plus locking") and LB (see finish_lock_switch()).

The comment in kernel/sched/core.c:task_rq_lock() describes an ins-
tance of MP+wmb+addr-acqpo.

The comment in include/linux/wait.h:waitqueue_active() describes an
instance of SB+mb+mb.

63cae12bce986 ("perf/core: Fix sys_perf_event_open() vs. hotplug")
describes an instance of W+RWC+porel+mb+mb.

[...]

I wish we could say "any barrier (explicit or implicit) in sources
is accompanied by a comment mentioning the interested pattern...",
but life is not always this simple. ;-)

  Andrea


> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> This document lists the litmus-test patterns that we have been discussing,
> along with examples from the Linux kernel.  This is intended to feed into
> the recipes document.  All examples are from v4.13.
> 
> 0.	Single-variable SC.
> 
> 	a.	Within a single CPU, the use of the ->dynticks_nmi_nesting
> 		counter by rcu_nmi_enter() and rcu_nmi_exit() qualifies
> 		(see kernel/rcu/tree.c).  The counter is accessed by
> 		interrupts and NMIs as well as by process-level code.
> 		This counter can be accessed by other CPUs, but only
> 		for debug output.
> 
> 	b.	Between CPUs, I would put forward the ->dflags
> 		updates, but this is anything but simple.  But maybe
> 		OK for an illustration?
> 
> 1.	MP (see test6.pdf for nickname translation)
> 
> 	a.	smp_store_release() / smp_load_acquire()
> 
> 		init_stack_slab() in lib/stackdepot.c uses release-acquire
> 		to handle initialization of a slab of the stack.  Working
> 		out the mutual-exclusion design is left as an exercise for
> 		the reader.
> 
> 	b.	rcu_assign_pointer() / rcu_dereference()
> 
> 		expand_to_next_prime() does the rcu_assign_pointer(),
> 		and next_prime_number() does the rcu_dereference().
> 		This mediates access to a bit vector that is expanded
> 		as additional primes are needed.  These two functions
> 		are in lib/prime_numbers.c.
> 
> 	c.	smp_wmb() / smp_rmb()
> 
> 		xlog_state_switch_iclogs() contains the following:
> 
> 			log->l_curr_block -= log->l_logBBsize;
> 			ASSERT(log->l_curr_block >= 0);
> 			smp_wmb();
> 			log->l_curr_cycle++;
> 
> 		And xlog_valid_lsn() contains the following:
> 
> 			cur_cycle = ACCESS_ONCE(log->l_curr_cycle);
> 			smp_rmb();
> 			cur_block = ACCESS_ONCE(log->l_curr_block);
> 
> 	d.	Replacing either of the above with smp_mb()
> 
> 		Holding off on this one for the moment...
> 
> 2.	Release-acquire chains, AKA ISA2, Z6.2, LB, and 3.LB
> 
> 	Lots of variety here, can in some cases substitute:
> 	
> 	a.	READ_ONCE() for smp_load_acquire()
> 	b.	WRITE_ONCE() for smp_store_release()
> 	c.	Dependencies for both smp_load_acquire() and
> 		smp_store_release().
> 	d.	smp_wmb() for smp_store_release() in first thread
> 		of ISA2 and Z6.2.
> 	e.	smp_rmb() for smp_load_acquire() in last thread of ISA2.
> 
> 	The canonical illustration of LB involves the various memory
> 	allocators, where you don't want a load from about-to-be-freed
> 	memory to see a store initializing a later incarnation of that
> 	same memory area.  But the per-CPU caches make this a very
> 	long and complicated example.
> 
> 	I am not aware of any three-CPU release-acquire chains in the
> 	Linux kernel.  There are three-CPU lock-based chains in RCU,
> 	but these are not at all simple, either.
> 
> 	Thoughts?
> 
> 3.	SB
> 
> 	a.	smp_mb(), as in lockless wait-wakeup coordination.
> 		And as in sys_membarrier()-scheduler coordination,
> 		for that matter.
> 
> 		Examples seem to be lacking.  Most cases use locking.
> 		Here is one rather strange one from RCU:
> 
> 		void call_rcu_tasks(struct rcu_head *rhp, rcu_callback_t func)
> 		{
> 			unsigned long flags;
> 			bool needwake;
> 			bool havetask = READ_ONCE(rcu_tasks_kthread_ptr);
> 
> 			rhp->next = NULL;
> 			rhp->func = func;
> 			raw_spin_lock_irqsave(&rcu_tasks_cbs_lock, flags);
> 			needwake = !rcu_tasks_cbs_head;
> 			*rcu_tasks_cbs_tail = rhp;
> 			rcu_tasks_cbs_tail = &rhp->next;
> 			raw_spin_unlock_irqrestore(&rcu_tasks_cbs_lock, flags);
> 			/* We can't create the thread unless interrupts are enabled. */
> 			if ((needwake && havetask) ||
> 			    (!havetask && !irqs_disabled_flags(flags))) {
> 				rcu_spawn_tasks_kthread();
> 				wake_up(&rcu_tasks_cbs_wq);
> 			}
> 		}
> 
> 		And for the wait side, using synchronize_sched() to supply
> 		the barrier for both ends, with the preemption disabling
> 		due to raw_spin_lock_irqsave() serving as the read-side
> 		critical section:
> 
> 		if (!list) {
> 			wait_event_interruptible(rcu_tasks_cbs_wq,
> 						 rcu_tasks_cbs_head);
> 			if (!rcu_tasks_cbs_head) {
> 				WARN_ON(signal_pending(current));
> 				schedule_timeout_interruptible(HZ/10);
> 			}
> 			continue;
> 		}
> 		synchronize_sched();
> 
> 		-----------------
> 
> 		Here is another one that uses atomic_cmpxchg() as a
> 		full memory barrier:
> 
> 		if (!wait_event_timeout(*wait, !atomic_read(stopping),
> 					msecs_to_jiffies(1000))) {
> 			atomic_set(stopping, 0);
> 			smp_mb();
> 			return -ETIMEDOUT;
> 		}
> 
> 		int omap3isp_module_sync_is_stopping(wait_queue_head_t *wait,
> 						     atomic_t *stopping)
> 		{
> 			if (atomic_cmpxchg(stopping, 1, 0)) {
> 				wake_up(wait);
> 				return 1;
> 			}
> 
> 			return 0;
> 		}
>