Web lists-archives.com

Re: Hangs on connect to UNIX socket being listened on in the same process (was: Cygwin hanging in pselect)




Hi Corinna,

Thanks for the response.

On Mon, Jan 9, 2017 at 3:13 PM, Corinna Vinschen
<corinna-cygwin@xxxxxxxxxx> wrote:
> Hi Erik,
>
> On Jan  9 14:29, Erik Bray wrote:
>> On Mon, Jan 9, 2017 at 12:01 PM, Erik Bray <erik.m.bray@xxxxxxxxx> wrote:
>> > On Fri, Jan 6, 2017 at 12:40 PM, Erik Bray <erik.m.bray@xxxxxxxxx> wrote:
>> >> Hello, and happy new-ish year,
>> >>
>> >> I've been working on and off over the past few months on bringing
>> >> Python's compatibility with Cygwin up to snuff, including having all
>> >> pertinent tests passing.  I've noticed that there are several tests
>> >> (which I currently skip) that cause the process to hang indefinitely,
>> >> and not respond to any signals from Cygwin (it can only be killed from
>> >> Windows).  This is Cygwin 64-bit--I have not tested 32-bit.
>> >> [...]
>> > I made a little bit of progress debugging this, but now I'm stumped.
>> > It seems the problem is this:
>> >
>> > For each socket whose fd is passed to select() a thread_socket is
>> > started which calls peek_socket until there are bits ready on the
>
> Yes and no.  One thread_socket is called per 62 sockets, to account
> for the maximum number of handles per WaitForMultipleObjects call.
>
>> > socket, or until the timeout is reached.  This in turn calls
>> > fhandler_socket::evaluate_events.
>> > [...]
>> After playing around with this a bit more I came up with a much
>> simpler example.  This has nothing to do with select( ) at all,
>> directly.
>
> Right.  It has to do with how connect/accept works on AF_LOCAL sockets.
> The handshake doesn't work well for situations like yours, where the
> same thread tries to connect and accept on the same socket.

Actually I'm not entirely sure now that that's the issue, even
considering that this has come up before.  Or at the very least,
there's an additional issue.  I realized that when I tried separate
client/server processes, in the server I had put an accept() call at
the end so it would block there.  With the server waiting to accept a
connection it succeeded.  However, when I replaced the accept() with a
long sleep(), the client's connect() never returns.

IIUC the handshake can't succeed until and unless the server accepts a
connection from the client.  On Linux, however, connect() returns
immediately after a successful TCP handshake, and the connection is
placed on the server's listen queue.  I don't know if the same holds
on Windows.  But since the underlying winsock is in non-blocking mode
anyways it shouldn't have to then block until the af_local handshake
can succeed.  I almost wonder if the server side in this case
shouldn't start up a thread to accept the af_local handshake, but you
would know better.

> This has been found a problem in porting postfix already and at the time
> we added a patch to circumvent the problem.  Before calling connect, add
> this:
>
>   setsockopt (sock_server, SOL_SOCKET, SO_PEERCRED, NULL, 0);
>   setsockopt (sock_client, SOL_SOCKET, SO_PEERCRED, NULL, 0);
>
> This is, of course, a hack.  The problem here is that server and client
> of a socket are independent of each other, and there's typically no
> way to know which process created the server side unless you already
> are connected.  Chicken/egg.

I tried it and it worked, both in the single process and separate
process examples.  I see now--this sets
fhandler_socket::no_getpeerid=true, so it doesn't have to do the
handshake at all.

> While replying to your mail, a thought occured to me, though.
>
> We might get away without the above setsockopt calls by adding a check
> to connect.  It could test if the socket has already been opened by
> the same process and is bound.  This could be accomplished by scanning
> the file descriptor table (dtable) of the process.  If we find it,
> we set the above socket option on both ends and continue without the
> secret and credential check.  Credentials could be set manually since we
> know user, group, and pid at this point.
>
> It's a bit of work but might be feasible.

I see what you're saying, but it appears that would only work in the
case where both sockets are opened by the same process.  Of course,
that was my original use case, but now I'm realizing the problem
extends beyond that--that the handshake can't complete unless the
server is explicitly accepting connections.

Thanks,
Erik

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple