-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Simplified RRF Retriever #129659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplified RRF Retriever #129659
Conversation
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
Hi @Mikep86, I've created a changelog YAML for you. |
Pinging @elastic/search-eng (Team:SearchOrg) |
Pinging @elastic/search-relevance (Team:Search - Relevance) |
// TODO: Refactor duplicate code | ||
// Using the multi-fields query format | ||
var localIndicesMetadata = resolvedIndices.getConcreteLocalIndicesMetadata(); | ||
if (localIndicesMetadata.size() > 1) { | ||
throw new IllegalArgumentException( | ||
"[" + NAME + "] cannot specify [" + QUERY_FIELD.getPreferredName() + "] when querying multiple indices" | ||
); | ||
} else if (resolvedIndices.getRemoteClusterIndices().isEmpty() == false) { | ||
throw new IllegalArgumentException( | ||
"[" + NAME + "] cannot specify [" + QUERY_FIELD.getPreferredName() + "] when querying remote indices" | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kderusso I know you requested that we refactor this common code, can we handle that in a follow up along with refactoring the common test code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, given the timing I'm fine with that happening in a followup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I am OK with the cleanup/consolidation being a followup, a few questions on tests plus we should update the docs. Approving to not block.
.../rank-rrf/src/yamlRestTest/resources/rest-api-spec/test/rrf/310_rrf_retriever_simplified.yml
Show resolved
Hide resolved
// TODO: Refactor duplicate code | ||
// Using the multi-fields query format | ||
var localIndicesMetadata = resolvedIndices.getConcreteLocalIndicesMetadata(); | ||
if (localIndicesMetadata.size() > 1) { | ||
throw new IllegalArgumentException( | ||
"[" + NAME + "] cannot specify [" + QUERY_FIELD.getPreferredName() + "] when querying multiple indices" | ||
); | ||
} else if (resolvedIndices.getRemoteClusterIndices().isEmpty() == false) { | ||
throw new IllegalArgumentException( | ||
"[" + NAME + "] cannot specify [" + QUERY_FIELD.getPreferredName() + "] when querying remote indices" | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, given the timing I'm fine with that happening in a followup
+ " }" | ||
+ " }" | ||
+ "}"; | ||
String restContent = """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we parse this twice, once with retrievers and once with field/query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is only for XContent parsing purposes, the resulting retriever does not need to pass SearchRequest
validation
"foo" | ||
); | ||
|
||
// Non-default rank window size and rank constant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand that this is testing anything with rank window size or rank constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's testing that the rank window size and rank constant are propagated to the rewritten retrievers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I liked the tests in 310_rrf_retriever_simplified.yml
. It made much easier to understand the required changes.
...-rrf/src/yamlRestTest/resources/rest-api-spec/test/linear/20_linear_retriever_simplified.yml
Show resolved
Hide resolved
x-pack/plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/rrf/RRFRetrieverBuilder.java
Show resolved
Hide resolved
...rrf/src/yamlRestTest/java/org/elasticsearch/xpack/rank/rrf/RRFRankClientYamlTestSuiteIT.java
Show resolved
Hide resolved
.../rank-rrf/src/yamlRestTest/resources/rest-api-spec/test/rrf/310_rrf_retriever_simplified.yml
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
💔 Backport failed
You can use sqren/backport to manually backport by running |
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
(cherry picked from commit 636da86) # Conflicts: # x-pack/plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/rrf/RRFRetrieverBuilder.java # x-pack/plugin/rank-rrf/src/test/java/org/elasticsearch/xpack/rank/rrf/RRFRetrieverBuilderTests.java
(cherry picked from commit 636da86) # Conflicts: # x-pack/plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/rrf/RRFRetrieverBuilder.java # x-pack/plugin/rank-rrf/src/test/java/org/elasticsearch/xpack/rank/rrf/RRFRetrieverBuilderTests.java
Adds a simplified syntax for the
rrf
retriever:fields
is optional. If it is not provided, we query the fields defined by theindex.query.default_field
index setting (which is*
by default).This syntax automatically handles querying a mix of lexical fields (i.e. fields that support lexical search via
match
) andsemantic_text
fields. The fields are divided into lexical and semantic groups to create a 50/50 weight distribution between the two in the final ranking. This is achieved by creating a retriever tree that looks like:This is a sibling of the simplified
linear
retriever, which was added in #129200.