Skip to content

Client does not retry alternative hosts when retry_on_status is triggered #893

@jackellenberger

Description

@jackellenberger

Context

When multiple hosts are provided to a client and a TransportError occurs, e.g. Faraday::ConnectionFailed, the request is passed on to be tried on the next available host. When all hosts have been tried, but there are more retry attempts remaining, all hosts are revived. This behavior seems pretty straight forward.

Problem

However, when the retry_on_status option is provided to the client, along with multiple hosts, all retries are attempted against the erroring host, and secondary hosts are never queried. In my understanding, this is because retry is called immediately, before the host connection can be killed.

Not only is this somewhat unexpected behavior, there is the added wrinkle that the retry count is adjusted up for multi host connections regardless of whether or not all those connections are used. So with 2 hosts on a client and a retry_on_failure value of 3, a transport error will retry 3 times each on host 1 and host 2, alternating between the two, but on an exception that is noted in retry_on_status, host 1 will see 6 attempted requests before it gives up.

Version

We are still way back on version 5.0.4, but this behavior appears to be the same all the way through 7.7.0

Example

Here are logs from a client with two hosts, ["foobar-us-east-1", "barbaz-us-east-2"], retry_on_failure: 3, retry_on_status: [503]

Expected behavior, and the current behavior of TransportErrors:

{"level":"ERROR","msg":"[Faraday::ConnectionFailed] Failed to open TCP connection to foobar-us-east-1:9200 (getaddrinfo: Name or service not known) {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"}"}
{"level":"WARN","msg":"[Faraday::ConnectionFailed] Attempt 1 connecting to {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"}"}
{"level":"ERROR","msg":"[Faraday::ConnectionFailed] Failed to open TCP connection to bazbat-us-east-2:9200 (getaddrinfo: Name or service not known) {:host=>\"bazbat-us-east-2\", :port=>9200, :protocol=>\"http\"}"}
{"level":"WARN","msg":"[Faraday::ConnectionFailed] Attempt 2 connecting to {:host=>\"bazbat-us-east-2\", :port=>9200, :protocol=>\"http\"}"}
{"level":"ERROR","msg":"[Faraday::ConnectionFailed] Failed to open TCP connection to foobar-us-east-1:9200 (getaddrinfo: Name or service not known) {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"}"}
{"level":"WARN","msg":"[Faraday::ConnectionFailed] Attempt 3 connecting to {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"}"}
{"level":"ERROR","msg":"[Faraday::ConnectionFailed] Failed to open TCP connection to bazbat-us-east-2:9200 (getaddrinfo: Name or service not known) {:host=>\"bazbat-us-east-2\", :port=>9200, :protocol=>\"http\"}"}
{"level":"WARN","msg":"[Faraday::ConnectionFailed] Attempt 4 connecting to {:host=>\"bazbat-us-east-2\", :port=>9200, :protocol=>\"http\"}"}
{"level":"ERROR","msg":"[Faraday::ConnectionFailed] Failed to open TCP connection to foobar-us-east-1:9200 (getaddrinfo: Name or service not known) {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"}"}
{"level":"WARN","msg":"[Faraday::ConnectionFailed] Attempt 5 connecting to {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"}"}
{"level":"ERROR","msg":"[Faraday::ConnectionFailed] Failed to open TCP connection to bazbat-us-east-2:9200 (getaddrinfo: Name or service not known) {:host=>\"bazbat-us-east-2\", :port=>9200, :protocol=>\"http\"}"}
{"level":"WARN","msg":"[Faraday::ConnectionFailed] Attempt 6 connecting to {:host=>\"bazbat-us-east-2\", :port=>9200, :protocol=>\"http\"}"}
{"level":"ERROR","msg":"[Faraday::ConnectionFailed] Failed to open TCP connection to foobar-us-east-1:9200 (getaddrinfo: Name or service not known) {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"}"}
{"level":"WARN","msg":"[Faraday::ConnectionFailed] Attempt 7 connecting to {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"}"}
{"level":"FATAL","msg":"[Faraday::ConnectionFailed] Cannot connect to {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"} after 7 tries"}

Note how attempts bounce back and forth between foobar-us-east-1 and bazbat-us-east-2

Current behavior of retry_on_status errors:

{"level":"WARN","msg":"[Elasticsearch::Transport::Transport::Errors::ServiceUnavailable] Attempt 1 to get response from http://foobar-us-east-1:9200/_search"}
{"level":"WARN","msg":"[Elasticsearch::Transport::Transport::Errors::ServiceUnavailable] Attempt 2 to get response from http://foobar-us-east-1:9200/_search"}
{"level":"WARN","msg":"[Elasticsearch::Transport::Transport::Errors::ServiceUnavailable] Attempt 3 to get response from http://foobar-us-east-1:9200/_search"}
{"level":"WARN","msg":"[Elasticsearch::Transport::Transport::Errors::ServiceUnavailable] Attempt 4 to get response from http://foobar-us-east-1:9200/_search"}
{"level":"WARN","msg":"[Elasticsearch::Transport::Transport::Errors::ServiceUnavailable] Attempt 5 to get response from http://foobar-us-east-1:9200/_search"}
{"level":"WARN","msg":"[Elasticsearch::Transport::Transport::Errors::ServiceUnavailable] Attempt 6 to get response from http://foobar-us-east-1:9200/_search"}
{"level":"WARN","msg":"[Elasticsearch::Transport::Transport::Errors::ServiceUnavailable] Attempt 7 to get response from http://foobar-us-east-1:9200/_search"}
{"level":"FATAL","msg":"[Elasticsearch::Transport::Transport::Errors::ServiceUnavailable] Cannot get response from http://foobar-us-east-1:9200/_search after 7 tries"}

Note how attempts are only made to foobar-us-east-1, and there are retry_count * number of hosts + 1 of them.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions