Problem

Inside AWS ECS, the application's requests to other HTTP endpoints fail with the exception System.Threading.Tasks.TaskCanceledException: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.. The problem only occurs with .NET-based applications.

Full stacktrace

System.Threading.Tasks.TaskCanceledException: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.ExecutionContextCallback(Object s)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext(Thread threadPoolThread)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext()
at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(IAsyncStateMachineBox box, Boolean allowInlining)
at System.Threading.Tasks.Task.RunContinuations(Object continuationObject)
at System.Threading.Tasks.Task.FinishContinuations()
at System.Threading.Tasks.Task.FinishStageThree()
at System.Threading.Tasks.Task.CancellationCleanupLogic()
at System.Threading.Tasks.Task.TrySetCanceled(CancellationToken tokenToRecord, Object cancellationException)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetException(Exception exception, Task`1& taskField)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetException(Exception exception)
at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.ExecutionContextCallback(Object s)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext(Thread threadPoolThread)

Context

  • The application must use an HTTP (8080) or HTTPS (8443) proxy to connect to the HTTP endpoints. It our case it is configured to use the HTTPS proxy.
  • Running cURL in the same ECS container against the HTTP endpoint before the .NET application starts, works without any problems; the proxy connection works.
  • x.509 certificates provided by the proxy can be validated by the operating system; there are no certificate errors when runningcURL.
  • IPv4 in the AWS VPC is disabled.
  • DNS name resolution with Dns.GetHostAddresses inside the application returned the correct IPv4 addresses.

simple network diagram with the issue

Findings

  • The issue could not be reproduced when running the Docker container on a local Fedora or WSL2 instance.
  • It did not make any difference when
    • downgrading from Docker image dotnet/aspnet:6.0 or dotnet/aspnet:5.0.
    • compiling with .NET 5.0 or .NET 6.0.
    • forcing HTTP/1.1 or HTTP/2 for the HttpClient instance.
    • ignoring any x.509 checks by using ServerCertificateCustomValidationCallback.
    • preferring IPv4 over IPv6.
  • Using a simple await Socket.ConnectAsync in the .NET application connected successfully to the proxy.
  • The proxy did not record any incoming requests when using HttpClient.
  • Related GitHub issues are

Solution

Debugging with strace

To get some more debug information, we installed strace in the Docker/ECS container. For running strace, the ECS container had to be started with the SYS_PTRACE capability:

  YourEcsTaskDefinition:
    Properties:
      ContainerDefinitions:
      - Name: proxy-test
        Environment:
        # ...
        Image: '...'
        LinuxParameters:
          Capabilities:
            Add:
            # on Fargate, only SYS_PTRACE is possible.
            - SYS_PTRACE

IPv6

After straceing the application, we experienced that .NET's HttpClient enforces IPv6:

1651670000000,"[pid 54] setsockopt(50, SOL_IPV6, IPV6_V6ONLY, [0], 4) = 0"
1651670000000,"[pid 54] setsockopt(50, SOL_IPV6, IPV6_V6ONLY, [1], 4) = 0"

That also happened if the HTTPS proxy's hostname had been an IPv4 address instead of a hostname. In both cases, the IPv4 address were translated via 4over6. Using IPv6 resulted in the timeout above.

Since .NET 5.0, HttpClient/SocketsHttpHandler uses DualMode. That means that IPv4 traffic is handled over IPv6 sockets. As described in the .NET 6 Networking Improvements, we disabled IPv6 by setting the environment variable DOTNET_SYSTEM_NET_DISABLEIPV6 to 1.

After enabling the environment variable, strace now showed that the CONNECT request for proxy connections were send by the application to the proxy server.

HTTPS proxy

Even after disabling IPv6, the HTTPS proxy did not receive the CONNECT request. By re-reading the documentation and looking through open GitHub issues, we found the following:

As we were using the HTTPS/8443 proxy, the proxy expected a TCP/TLS connection. The application opened an unencrypted TCP/HTTP encryption. Obviously, the protocols did not match. Instead of instantly terminating the connection, both endpoints kept the connection open. After 100 seconds, .NET ran in the default timeout and threw a System.Threading.Tasks.TaskCanceledException: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.. The following happened:

Instead of using the HTTPS/8443 proxy, we simply switched to the HTTP/8080 proxy alternative. This sequence diagram is showing the packet flow between our application, the HTTP proxy server and the target endpoint:

From a securiy standpoint, using the HTTP/8080 proxy is not an issue. Only the initial connection with the CONNECT request is unencrypted. It only contains the target hostname and target port. All further requests are encrypted with the protocol the target webserver provides, e.g. TLS 1.3. Having the target hostname and port unencrypted in the frist request are also not an issue at the moment: That meta information would also have been freely available in an other than that encrypted HTTPS/TLS connection. Encrypted SNI and its successor Encrypted Client Hello (ECH) are not widely deployed.

In the end, this network diagram shows our solution from a bird-view perspective:

simple network diagram with the solution