-
Notifications
You must be signed in to change notification settings - Fork 25.4k
ConnectTransportException returns retryable BAD_GATEWAY #118681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ConnectTransportException returns retryable BAD_GATEWAY #118681
Conversation
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
Hi @DiannaHohensee, I've created a changelog YAML for you. |
I think that should be IllegalStateException, because we already have remote connection to another cluster. Why it needs REST status attached? Do we send it back to the REST handler and client? |
Thanks for taking a look and confirming it seems strange, @mhl-b. I'll take a closer look at Proxy* -- I'm not familiar with it, atm. I forgot to reference the ticket, but this relates to ES-10214 -- I just updated my description to include it. I don't myself know how the HTTP error code propagates back to the caller, but other folks believe that it does. I would think it could reasonably end up in a stack trace of 'caused by' errors, if not directly. |
Hmm. I think IllegalStateException would indicate ES has done something wrong. Whereas ES looks like it's throwing the ConnectTransportException when the information from the user doesn't match up with reality: like setting an address to a cluster and a cluster name, and then ES verifies that the connected to cluster's name matches, and it doesn't. I think we need a "user gave us a bad parameter" non-retryable exception? There is some more code throwing the ConnectTransportException, and I think the idea is that we connected successfully to some server, but the type of server is not what ES expected to find. |
After reviewing the @DaveCTurner for thoughts. |
My reading of the HTTP spec is that we are indeed acting as a gateway in this situation:
Moreover, if the remote node turns out to belong to a cluster with an unexpected name then
There's other ways that a bad in-cluster config might lead to a It's definitely not a client-side failure so we have to return some kind of 5xx error, and 502 seems more appropriate than 500. If clients choose to retry on all 502s then 🤷 that's not wrong I guess. |
Would be nice to have an integ or yaml/rest test to show rest status propagation. I dont think status is very important, 500 or 502, since we model REST over HTTP and there is no strict requirement to confine to error codes, as David said there are not enough of them. So client still need to unwrap 5xx HTTP response to make decision, including retry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except a couple of nits
server/src/main/java/org/elasticsearch/transport/ConnectTransportException.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/transport/ConnectTransportException.java
Outdated
Show resolved
Hide resolved
…ortException.java Co-authored-by: David Turner <david.turner@elastic.co>
…ortException.java Co-authored-by: David Turner <david.turner@elastic.co>
Hi @DiannaHohensee, I've updated the changelog YAML for you. |
In the related SDHE, Kibana was seeing |
The test failures match #118846, so they are not a blocker. |
ConnectTransportException and its subclasses previous translated to a INTERNAL_SERVER_ERROR HTTP 500 code. We are changing it to 502 BAD_GATEWAY so that users may choose to retry it on connectivity issues. Related ES-10214 Closes elastic#118320
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
ConnectTransportException and its subclasses previous translated to a INTERNAL_SERVER_ERROR HTTP 500 code. We are changing it to 502 BAD_GATEWAY so that users may choose to retry it on connectivity issues. Related ES-10214 Closes elastic#118320 (cherry picked from commit 517abe4)
…19146) ConnectTransportException and its subclasses previous translated to a INTERNAL_SERVER_ERROR HTTP 500 code. We are changing it to 502 BAD_GATEWAY so that users may choose to retry it on connectivity issues. Related ES-10214 Closes #118320 (cherry picked from commit 517abe4) Co-authored-by: Dianna Hohensee <dianna.hohensee@elastic.co>
I took a spin through the code with
new ConnectTransportException
, and saw this error case. That one doesn't seem like it should be a retryable error 🤔 Is this status code change safe, or maybe we should change that code to a different exception type?Related ES-10214
Closes #118320