Web lists-archives.com

Re: long I/O delays when strace is running




Daniel Santos wrote:
I've tracked it down to this little Sleep() loop in pinfo::init.

       bool created = shloc != SH_JUSTOPEN;

       /* Detect situation where a transitional memory block is being retrieved.
      If the block has been allocated with PINFO_REDIR_SIZE but not yet
      updated with a PID_EXECED state then we'll retry. */
       if (!created && !(flag & PID_NEW))
     /* If not populated, wait 2 seconds for procinfo to become populated.
        Would like to wait with finer granularity but that is not easily
        doable.  */
     for (int i = 0; i < 200 && !procinfo->ppid; i++)
       Sleep (10);

I tried putting a stupid memory barrier in the loop and a volatile read just for
kicks, but that doesn't seem to be the problem.  I'm headed off to bed.  This
only happens when using strace, so if anybody has ideas please post.

I can reproduce your issue on a real Win7.64 machine so that removes any possible virtual machine root cause. I was running 'top -s1' in one window while running your testcase in another window. Yes, top froze for many seconds at a time, then caught its display up, only to freeze again repeatedly. It was still frozen for a while after your testcase had ended (!), then caught up. Your mention of pinfo::init and 'ps' along with my usage of 'top' leads me to think this may be somehow related to the /proc filesystem.

Here's my humble contribution to the discussion:

~ time w
 02:15:52 up 3 days, 20:34,  0 users,  load average: 0.99, 0.62, 0.31
USER     TTY        LOGIN@   IDLE   JCPU   PCPU WHAT

real    0m0.203s    <-- OK, nice and fast
user    0m0.077s
sys     0m0.139s

~ time strace -o w.out w
 02:16:23 up 3 days, 20:34,  0 users,  load average: 0.54, 0.55, 0.29
USER     TTY        LOGIN@   IDLE   JCPU   PCPU WHAT

real    0m28.487s   <-- but stracing it is much, much slower
user    0m0.015s
sys     0m0.000s

The 'w' command is normally pretty fast. Running it under strace makes it take an unreasonably long time. Something seems busted somewhere. The strace output for this example has many occurrences of ~3.1-second delays that seem to occur as w is accumulating process time information for all processes.

..mark

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple