Project

General

Profile

Bug #7093

Deadlock when chaining http-requests

Added by Marco Kinski over 3 years ago. Updated over 2 years ago.

Status:
Feedback
Priority:
Normal
Assignee:
-
Target version:
-
Start date:
06/17/2019
Due date:
% Done:

0%

Estimated time:

Description

I have found some sort of deadlock when chaining http-requests.

The lock happens every once in a while (~ 100 requests).

When it occurs WtHttp does not serve additional requests until the initial one is completed.

It happens regardless of Wt::Http::Client (i tried wininet and winhttp as alternatives)

Windows 10

Wt-Version: 3.3.12

Boost: 1.69

MSVC 2017


Files

main.cpp (1.81 KB) main.cpp Marco Kinski, 06/17/2019 11:14 AM
main.cpp (1.88 KB) main.cpp Marco Kinski, 06/18/2019 06:43 PM
ParallelThreads_success.PNG (113 KB) ParallelThreads_success.PNG Marco Kinski, 06/18/2019 06:43 PM
ParallelThreads_failure.PNG (67.9 KB) ParallelThreads_failure.PNG Marco Kinski, 06/18/2019 06:43 PM
issue_7093.cpp (2.02 KB) issue_7093.cpp Roel Standaert, 06/19/2019 12:25 PM
HttpClientServerTest.C (5.11 KB) HttpClientServerTest.C Marco Kinski, 07/03/2020 12:11 PM
#1

Updated by Roel Standaert over 3 years ago

  • Status changed from New to Feedback

Isn't this just a result of you using up all of threads in the pool in that busy wait loop?

WServer has a fixed size thread pool. If all threads in the pool are doing a busy wait, there's no more threads for IO.

#2

Updated by Marco Kinski over 3 years ago

It should only block 3- 4 threads.

  • Connection from curl inside the wait loop
  • HTTP Client outgoing
  • HTTP Client ingoing
  • session cleanup thread

I have updated the code.

I modified the example to use wApp which was not valid.

ParallelThreads_success.png is shown when singlestepping a successfull (not hangig) request in the debugger.

ParallelThreads_failure.png is shown when pausing a hanging request in the debugger.

I am not familiar with boost asio, maybe it's an easy problem.

#3

Updated by Roel Standaert over 3 years ago

How many of those curl processes are you running at the same time when you observe this issue?

#4

Updated by Marco Kinski over 3 years ago

one at a time, repeatedly:

$ while [ "$(curl -s 'http://localhost/FetchData')" == "done" ]; do echo -n '.'; sleep 1; done

After a request was hanging (and got aborted by a timeout situation in curl) the subsequent requests are processed normaly. Until its hanging again after (~100 request).

#5

Updated by Marco Kinski over 3 years ago

Marco Kinski wrote:

one at a time, repeatedly:

$ while [ "$(curl -s 'http://localhost/FetchData')" == "done" ]; do echo -n '.'; sleep 1; done

After a request was hanging (and got aborted by a timeout situation) the subsequent requests are processed normaly. Until its hanging again after (~100 request).

#6

Updated by Roel Standaert over 3 years ago

Wt submits tasks to a pool of by default 10 threads. If all of those 10 threads are busy, it can't do anything else.

I think maybe you're expecting it to only do one FetchData::handleRequest() at a time? It's perfectly possible that 10 threads are in FetchData::handleRequest() handling 10 requests at the same time. Of course, if all of those wait for another task to be completed using the same thread pool (either handling another request coming in, or the actual request being performed by the client), then they will hang.

So, it's not surprising to me that this deadlocks as a result of how FetchData::handleRequest() is implemented. It is maybe a bit surprising to me to see where it is exactly hanging on that screenshot, I'd rather expect it to hang in the while (!done) sleep loop.

The solution here is to not block. See the attachment to see how a continuation can be used instead.

#7

Updated by Roel Standaert over 3 years ago

Or... do you mean that you are actually just doing that one while loop and nothing else? That would be strange.

#8

Updated by Marco Kinski over 3 years ago

Its this one loop at the time it hangs.

The 9 other threads of the pool seem to wait for work.

#9

Updated by Roel Standaert over 3 years ago

Ok, sorry for the confusion. I do observe this myself now. Not sure why it would hang like that, though.

#10

Updated by Marco Kinski over 3 years ago

No problem, its realy a strange problem with lots of possibilities caused by wrong usage :-)

#11

Updated by Marco Kinski over 3 years ago

This issue is very urgent for me. Any idea what's wrong is welcome.

The suggested fix is not fisible for me, the real world situation is much more complex.

It seems that any connection waiting to be processed can pause any parallel upcoming request.

#12

Updated by Roel Standaert over 3 years ago

I frankly don't have a clue. I think it's absolutely bizarre, and I haven't found a solution for it yet. Actually making it so it doesn't do a busy wait (using a continuation) oddly did actually seem to fix it, though.

#13

Updated by Marco Kinski over 2 years ago

I modified test/HttpClientServerTest.C to include a test. Hopefully this helps hunting it down.

#14

Updated by Marco Kinski over 2 years ago

With the latest test code (HttpClientServerTest.C) I get an access violation. I am not sure if this is inside wt or a wrong usage of the Client class from the test. The access violation happens during destruction of Wt::Core::Impl::observer_info which gets called parallel to Wt::Core::Impl::observer_info::removeObserver.

I noticed that while the lockup happens the callstack of the running thread is always inside Wt::WReply::consumeRequestBody. After removing the optimization (line 240) to handle requests directly instead of posting these to the strand, the lockup is gone.

Also available in: Atom PDF