Web lists-archives.com

Bug(s) with creating large numbers of sockets




Hi all,

I found a few bugs in Cygwin w.r.t. creating large numbers of sockets.
For example, Cygwin will gladly let you create up to RLIMIT_NOFILE
sockets (examples in Python, where I found this problem):

>>> import resource
>>> import socket
>>> resource.getrlimit(resource.RLIMIT_NOFILE)
(256, 3200)
>>> resource.setrlimit(resource.RLIMIT_NOFILE, (3200, 3200))
>>> socks = [socket.socket() for _ in range(3000)]  # A bit fewer than the max but it doesn't matter

However, if I try to do anything interesting with those sockets, such
as poll on them, I get a rather unexpected error:

>>> import select
>>> poll = select.poll()
>>> for sock in socks:
...     poll.register(sock, select.POLLOUT)
...
>>> poll.poll()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 14] Bad address

After some playing around I found that I could make up to exactly 1365
sockets and use them without error.  At 1366 I get the error.  A very
strange and arbitrary number.  It turns out this is limited in Cygwin
by the array in fhandler_socket.cc:

 496 /* Maximum number of concurrently opened sockets from all Cygwin processes
 497    per session.  Note that shared sockets (through dup/fork/exec) are
 498    counted as one socket. */
 499 #define NUM_SOCKS       (32768 / sizeof (wsa_event))
...
 510 static wsa_event wsa_events[NUM_SOCKS] __attribute__((section
(".cygwin_dll     _common"), shared));

This choice for NUM_SOCKS is still seemingly small and pretty
arbitrary, but at least it's a choice, and probably well-motivated.
However, I think it's a problem that it's defined in terms of
sizeof(wsa_event).  On 32-bit Cygwin this is 16, so NUM_SOCKS is 2048
(a less strange number), whereas on 64-bit Cygwin sizeof(wsa_event) ==
24 (due to sizeof(long) == 8, plus alignment), so we are limited
to...1365 sockets.

If we have to set a limit I would just hard-code it to 2048 exactly.
I understand that the overhead associated with sockets in Cygwin
probably limits us from having 10s of thousands (much less millions)
and that's OK--I'm not trying to run some kind of C10K challenge on
Cygwin :)

The other problem, then, seems to be a bug in
fhandler_socket::init_events().  It doesn't check the return value of
search_wsa_event_slot(), which returns NULL if the wsa_events array is
full (and the socket is not a shared socket).  There's not a great
choice here for error code, but setting ENOBUF seems like the best
option.

Please see attached patch.

Best,
Erik

Attachment: 0001-Fix-two-bugs-in-the-limit-of-large-numbers-of-socket.patch
Description: Binary data

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple