Web lists-archives.com

Re: Hangs on connect to UNIX socket being listened on in the same process (was: Cygwin hanging in pselect)

On Mon, Jan 9, 2017 at 6:16 PM, Corinna Vinschen
<corinna-cygwin@xxxxxxxxxx> wrote:
> On Jan  9 16:46, Erik Bray wrote:
>> Hi Corinna,
>> Thanks for the response.
>> On Mon, Jan 9, 2017 at 3:13 PM, Corinna Vinschen wrote:
>> > Right.  It has to do with how connect/accept works on AF_LOCAL sockets.
>> > The handshake doesn't work well for situations like yours, where the
>> > same thread tries to connect and accept on the same socket.
>> Actually I'm not entirely sure now that that's the issue, even
>> considering that this has come up before.  Or at the very least,
>> there's an additional issue.  I realized that when I tried separate
>> client/server processes, in the server I had put an accept() call at
>> the end so it would block there.  With the server waiting to accept a
>> connection it succeeded.  However, when I replaced the accept() with a
>> long sleep(), the client's connect() never returns.
> That's because connect infinitely waits for the accept to reply the
> second half of the handshake.
>> IIUC the handshake can't succeed until and unless the server accepts a
>> connection from the client.
> This is exactly the underlying problem.  And interesting enough, even
> though the handshake is in Cygwin since 2001, we never had a problem
> with this until Christian started porting postfix in 2014!
>> I almost wonder if the server side in this case
>> shouldn't start up a thread to accept the af_local handshake, but you
>> would know better.
> No, I don't.  We discussed this issue briefly back in 2014, but as
> you can see we don't have a solution for this border case yet.
> Starting a thread may or may not work, but there are a couple of
> use-cases to keep in mind (which I can't reproduce off the top of my head).
> The old postfix cygwin-apps thread from 2014 might give you some idea.
>> > This has been found a problem in porting postfix already and at the time
>> > we added a patch to circumvent the problem.  Before calling connect, add
>> > this:
>> >
>> >   setsockopt (sock_server, SOL_SOCKET, SO_PEERCRED, NULL, 0);
>> >   setsockopt (sock_client, SOL_SOCKET, SO_PEERCRED, NULL, 0);
>> >
>> > This is, of course, a hack.  The problem here is that server and client
>> > of a socket are independent of each other, and there's typically no
>> > way to know which process created the server side unless you already
>> > are connected.  Chicken/egg.
>> I tried it and it worked, both in the single process and separate
>> process examples.  I see now--this sets
>> fhandler_socket::no_getpeerid=true, so it doesn't have to do the
>> handshake at all.
> Right.  A better solution for the problem would be nice.  Ultimately
> we want to check if the other side of the socket is actually a Cygwin
> process which knows the secret, not a stray native Windows process
> which accidentally hopped on the bandwagon, and we want to exchange
> the credentials so a subsequent SO_PEERCRED call returns correct values.

Ah, okay. I found the original thread you mentioned, and I see that
you sort of discussed some possibilities but nothing was quite
satisfactory at the time, and it was dropped--you mentioned some idea
about exchanging information via pipes, but that was a bit complicated
and half-baked.

Christian described a scheme in that thread which at least seemed like
a way out of the connect hanging problem, and also improved the
security (I think) by having separate server and client secrets, so
that a malicious server could not gain the socket secret from the
client.  But he also worried:

> The only drawback which remains is that the client performs the send()
> before first recv() unconditionally. It will realize the bad server secret
> lately on first recv().

Though you wrote:

> Yeah, but it might be better than nothing and if it avoids the hangs,
> even better.

Which is sort of how I feel, though I do appreciate the security
implication.  One workaround to that which I think might be relatively
simple:  In Christian's scheme, after a connect() the client would be
in a "connected but secret missing" state.  What I would propose
adding is that the client then fires up a thread to wait on receiving
the server's secret (which it would send after receiving the client's
secret in an accept()).  Meanwhile, while the cliet is in the "secret
missing" state, any subsequent send()s would place the sent data on a
local buffer (no bigger than getsockopt(SO_SNDBUF) ?) that would only
get flushed out to actual WSASendTo calls once the server secret is

The only downside I see to this is the added overhead of having to
start a thread for the purpose of waiting to receive the server's
secret which--in many common cases--would be unneeded since the server
may accept() immediately.  So in that case we might default to
blocking to receive the server's secret, but with a relatively brief
timeout, and then only start up a thread in case the server secret
isn't received quickly.


Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple