Project

General

Profile

Droping requests and not closing sockets (CLOSE_WAIT state)

Added by Maciej Dudyński over 2 years ago

Hi.
My Wt based application is behaving quite strange recently: the longer it is up the slower it gets and more requests are lost/timed out.
I'm almost 100% sure that there are no problems connected to CPU/RAM usage.

And I'm starting to believe that there are some issues in Wt, especially handling sockets when the connection timeouts. It is so because the longer my application runs the "new connection" and "removed connection" logs are showing bigger value. Even when there are no active session the logs are showing more than couple of hundreds connections. And I'm seeing in netstat that there is a socket in "CLOSE_WAIT" state for each connection.

So I started to look in Wt code and found something (at least I think so). Function Connection::doTimeout() in src/http/Connection.C is doing socket shutdown but is never doing socket close. Is this correct? Everywhere else (in Wt code) they are always together.

I'm currently building Wt with some logs added to this doTimeout function and I will try to find some correlation between the amount of timeouts and not closed sockets.


Replies (5)

RE: Droping requests and not closing sockets (CLOSE_WAIT state) - Added by Maciej Dudyński over 2 years ago

So I added two LOG_ERRORs: one inside the if statement in Connection::timeout function and one inside Connection::doTimeout.

So after about 24 hours after deploying the logs are showing me that function Connection::timeout was executed 226 times, but Connection::doTimeout was never executed. Do you know what the reason of that strange behaviour?

Also after those 226 timeouts i have ~237 socket in CLOSE_WAIT state and I'm seeing a really strong correlation between those two numbers. Current number of sockets is slightly bigger, but that's probably because of active sessions.

RE: Droping requests and not closing sockets (CLOSE_WAIT state) - Added by Korneel Dumon over 2 years ago

Hi,

the timeout callback is also executed when the timer is cancelled or reset (it gets status aborted in that case). I think that explains the difference between the number of calls to timeout and doTimeout.

Of course, the CLOSE_WAIT issue remains ... Are you using websockets or WApplication::enableUpdates() in your application?

RE: Droping requests and not closing sockets (CLOSE_WAIT state) - Added by Maciej Dudyński over 2 years ago

Hi,

We are using enableUpdates().
The timeouts (or as you wrote cancels/resets) are also appearing when somebody is accessing some static resources (for example some images that are in /resources). So probably changing our approach wont fix this issue.

Btw:

  • Our application is in docker with Alpine.
  • The application is behind a Nginx (Openresty) which is in docker with Ubuntu.
  • Both containers are on host with Ubuntu. Do you think there may be some issues caused that our application is on different system (totally different libraries)?

RE: Droping requests and not closing sockets (CLOSE_WAIT state) - Added by Korneel Dumon over 2 years ago

Hi,

it appears Bruce Toll had the same issue and he figured out the problem. This issue is now being tracked here: https://redmine.emweb.be/issues/9106

RE: Droping requests and not closing sockets (CLOSE_WAIT state) - Added by Maciej Dudyński over 2 years ago

Hi,

We didn't know why this works like I described in previous posts, but we actually found a quite simple solution (or a workaround):
Disabling userland-proxy in docker service (adding "userland-proxy": false in /etc/docker/daemon.json) fixed this issue.

    (1-5/5)