Web lists-archives.com

Re: long I/O delays when strace is running




Daniel Santos wrote:

On 04/21/2017 05:12 PM, Mark Geisert wrote:

Re debugging strace itself, you may not realize that strace is not a
Cygwin-native program.  It's a Windows-native program.  So debugging it with
Cygwin gdb is problematic.

I can tell you roughly how strace operates.  It launches the target program
with CreateProcess(), not with exec().  It acts as the debugger of the target
in order to receive Windows DEBUG_EVENTs for things like DLL loads, thread
creation, etc. etc.  The Cygwin DLL in the target can tell when it's being run
under strace and generates trace lines with OutputDebugString() calls that
strace receives as DEBUG_EVENTs.  strace then just outputs the trace line.

strace does have to explicitly ContinueDebugEvent() on each DEBUG_EVENT it
receives, and if there's some issue with that then the target program will
remain suspended awaiting that continue. I've never seen an issue with this
but it's another possible place to look, I suppose.

Why strace appears to interact badly with /proc accesses, I have no idea.

..mark

Well thank you, I wish I had read this earlier. I've been trying to debug (with
gdb) strace (following children) and now I know why the debugger is blowing
through breakpoints, because I may be debugging strace, but strace debugging
it's own child.  I added a _pinfo::dump () member function that I'm calling from
pinfo::init () after the test "if (!created && !(flag & PID_NEW))" and it's
crashing because for some processes, the entire _pinfo struct isn't mapped in,
just the first page.  Of course, I'm a newbie in this sphere, so maybe I've
missed something and there's sometimes a reason for this?  (The first part of
the struct looks normal.) Otherwise, that would be another odd flaw.

I don't know enough about pinfo.h and pinfo.cc to say anything definite on this.

Anyway, I can see that the strace process's shared _pinfo object is never fully
populated:

_pinfo 0x30000 {
   pid 2800,
   process_state 0x00000001,
   ppid 0,
   exitcode 0
   cygstarted 0,
   dwProcessId 0x00000AF0,
   progname "D:\cygwin64\bin\strace.exe",
   uid 0,
   gid 0,
   pgid 0,
   sid 0,
   ctty 0,
   has_pgid_children 0
   start_time 1492881370,
   nice 0,
   stopsig 0,
   sendsig 0x0,
   exec_sendsig 0x0,
   exec_dwProcessId 0
}

Again, strace.exe is a Windows executable, so perhaps some of those fields don't make sense for a non-Cygwin process and are not initialized? Purely speculation on my part.

So I would venture to say that is a problem.  Also, pinfo::init() should
probably issure some error message if it waits 2-ish seconds and the struct
still isn't correctly populated.

That seems right. I unfortunately don't know why the code presumes the struct is always populated within a certain (small) amount of time. The complaint comment about minimum possible sleep duration sure makes it seem like it's always supposed to be populated very quickly.

Is there a way to debug the children of strace?  It would make it a lot easier.
That's part of why I wrote the _pinfo::debug(), but also when I debug strace
with gdb, the _pinfo struct IS properly populated.  The best problems are the
ones that disappear when you try to debug them.

strace always acts as the debugger of the target process you start strace with (or attach to; see '-p' in strace's help). strace has a switch '-f' == '--trace-children' that defaults to being ON. So by default strace is getting DEBUG_EVENTs from the target strace launched as well as any process the target creates.

If you explicitly set the '-f' flag, you're actually turning OFF that default and *only* the target process sends DEBUG_EVENTs. In that case any process the target creates will be invisible to strace. You could conceivably debug those sub-processes with gdb but you likely won't be able to catch them at their startup unless they wait for your attach.

..mark


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple