-
Notifications
You must be signed in to change notification settings - Fork 613
Description
Context
When multiple hosts are provided to a client and a TransportError
occurs, e.g. Faraday::ConnectionFailed
, the request is passed on to be tried on the next available host. When all hosts have been tried, but there are more retry attempts remaining, all hosts are revived. This behavior seems pretty straight forward.
Problem
However, when the retry_on_status
option is provided to the client, along with multiple hosts, all retries are attempted against the erroring host, and secondary hosts are never queried. In my understanding, this is because retry
is called immediately, before the host connection can be killed.
Not only is this somewhat unexpected behavior, there is the added wrinkle that the retry count is adjusted up for multi host connections regardless of whether or not all those connections are used. So with 2 hosts on a client and a retry_on_failure
value of 3, a transport error will retry 3 times each on host 1 and host 2, alternating between the two, but on an exception that is noted in retry_on_status
, host 1 will see 6 attempted requests before it gives up.
Version
We are still way back on version 5.0.4
, but this behavior appears to be the same all the way through 7.7.0
Example
Here are logs from a client with two hosts, ["foobar-us-east-1", "barbaz-us-east-2"]
, retry_on_failure: 3
, retry_on_status: [503]
Expected behavior, and the current behavior of TransportErrors:
{"level":"ERROR","msg":"[Faraday::ConnectionFailed] Failed to open TCP connection to foobar-us-east-1:9200 (getaddrinfo: Name or service not known) {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"}"}
{"level":"WARN","msg":"[Faraday::ConnectionFailed] Attempt 1 connecting to {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"}"}
{"level":"ERROR","msg":"[Faraday::ConnectionFailed] Failed to open TCP connection to bazbat-us-east-2:9200 (getaddrinfo: Name or service not known) {:host=>\"bazbat-us-east-2\", :port=>9200, :protocol=>\"http\"}"}
{"level":"WARN","msg":"[Faraday::ConnectionFailed] Attempt 2 connecting to {:host=>\"bazbat-us-east-2\", :port=>9200, :protocol=>\"http\"}"}
{"level":"ERROR","msg":"[Faraday::ConnectionFailed] Failed to open TCP connection to foobar-us-east-1:9200 (getaddrinfo: Name or service not known) {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"}"}
{"level":"WARN","msg":"[Faraday::ConnectionFailed] Attempt 3 connecting to {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"}"}
{"level":"ERROR","msg":"[Faraday::ConnectionFailed] Failed to open TCP connection to bazbat-us-east-2:9200 (getaddrinfo: Name or service not known) {:host=>\"bazbat-us-east-2\", :port=>9200, :protocol=>\"http\"}"}
{"level":"WARN","msg":"[Faraday::ConnectionFailed] Attempt 4 connecting to {:host=>\"bazbat-us-east-2\", :port=>9200, :protocol=>\"http\"}"}
{"level":"ERROR","msg":"[Faraday::ConnectionFailed] Failed to open TCP connection to foobar-us-east-1:9200 (getaddrinfo: Name or service not known) {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"}"}
{"level":"WARN","msg":"[Faraday::ConnectionFailed] Attempt 5 connecting to {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"}"}
{"level":"ERROR","msg":"[Faraday::ConnectionFailed] Failed to open TCP connection to bazbat-us-east-2:9200 (getaddrinfo: Name or service not known) {:host=>\"bazbat-us-east-2\", :port=>9200, :protocol=>\"http\"}"}
{"level":"WARN","msg":"[Faraday::ConnectionFailed] Attempt 6 connecting to {:host=>\"bazbat-us-east-2\", :port=>9200, :protocol=>\"http\"}"}
{"level":"ERROR","msg":"[Faraday::ConnectionFailed] Failed to open TCP connection to foobar-us-east-1:9200 (getaddrinfo: Name or service not known) {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"}"}
{"level":"WARN","msg":"[Faraday::ConnectionFailed] Attempt 7 connecting to {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"}"}
{"level":"FATAL","msg":"[Faraday::ConnectionFailed] Cannot connect to {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"} after 7 tries"}
Note how attempts bounce back and forth between foobar-us-east-1
and bazbat-us-east-2
Current behavior of retry_on_status
errors:
{"level":"WARN","msg":"[Elasticsearch::Transport::Transport::Errors::ServiceUnavailable] Attempt 1 to get response from http://foobar-us-east-1:9200/_search"}
{"level":"WARN","msg":"[Elasticsearch::Transport::Transport::Errors::ServiceUnavailable] Attempt 2 to get response from http://foobar-us-east-1:9200/_search"}
{"level":"WARN","msg":"[Elasticsearch::Transport::Transport::Errors::ServiceUnavailable] Attempt 3 to get response from http://foobar-us-east-1:9200/_search"}
{"level":"WARN","msg":"[Elasticsearch::Transport::Transport::Errors::ServiceUnavailable] Attempt 4 to get response from http://foobar-us-east-1:9200/_search"}
{"level":"WARN","msg":"[Elasticsearch::Transport::Transport::Errors::ServiceUnavailable] Attempt 5 to get response from http://foobar-us-east-1:9200/_search"}
{"level":"WARN","msg":"[Elasticsearch::Transport::Transport::Errors::ServiceUnavailable] Attempt 6 to get response from http://foobar-us-east-1:9200/_search"}
{"level":"WARN","msg":"[Elasticsearch::Transport::Transport::Errors::ServiceUnavailable] Attempt 7 to get response from http://foobar-us-east-1:9200/_search"}
{"level":"FATAL","msg":"[Elasticsearch::Transport::Transport::Errors::ServiceUnavailable] Cannot get response from http://foobar-us-east-1:9200/_search after 7 tries"}
Note how attempts are only made to foobar-us-east-1, and there are retry_count * number of hosts + 1
of them.