-
Notifications
You must be signed in to change notification settings - Fork 25.4k
[ES|QL] Rerank operator improvements #132318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ES|QL] Rerank operator improvements #132318
Conversation
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
@@ -25,6 +25,9 @@ | |||
}, | |||
"year": { | |||
"type": "integer" | |||
}, | |||
"collection": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ℹ️ Added a new column to the dataset with sparse data, so we can test some sparse behavior.
; | ||
|
||
book_no:keyword | title:text | author:text | collection:text | rerank_score:double | _score:double | ||
2714 | Return of the King Being the Third Part of The Lord of the Rings | J. R. R. Tolkien | The Lord of the Rings | 0.04761905 | 8.56 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ℹ️ Testing that reranking return null when the input field is null
| KEEP book_no, title, ratings, _score | ||
; | ||
|
||
book_no:keyword | title:text | ratings:double | _score:double |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ℹ️ It is stupid to rerank on a number but at least it does not break.
| KEEP book_no, title, ratings, _score | ||
; | ||
|
||
book_no:keyword | title:text | ratings:double | _score:double |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ℹ️ Combining text and non-text fields. Will be encoded in a YAML document that will be passed to the reranker.
if (castRerankFieldsAsString | ||
&& rerank.isValidRerankField(resolved) | ||
&& DataType.isString(resolved.dataType()) == false) { | ||
resolved = resolved.replaceChild(new ToString(resolved.child().source(), resolved.child())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ℹ️ Casting non text input field to string,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Outdated
Show resolved
Hide resolved
…esql-inference-commands-input-validation
…cking * upstream/main: (166 commits) Reduce inactive sink interval in VectorSimilarityFunctionsIT (elastic#132288) ESQL: Allow agg tests to process many columns (elastic#132358) Update analysis-lowercase-tokenfilter.md (elastic#132359) Add Sparse Vector Index Options Settings to Semantic Text Field (elastic#131058) Collect node thread pool usage for shard balancing (elastic#131480) Add tasks to validate new style transport versions (elastic#131782) Mute org.elasticsearch.search.routing.SearchReplicaSelectionIT testNodeSelection elastic#132354 Mute org.elasticsearch.xpack.esql.action.CrossClusterAsyncQueryIT testBadAsyncId elastic#132353 Fixes DenseVectorFieldIndexTypeUpdateIT release tests (elastic#132346) Fix testCloseOrReallocateDuringPartialSnapshot (elastic#132049) (Doc) ILM Force Merge not on HDD and happens on hosting node not current phase tier (elastic#130280) Run GeoIp YAML tests in multi-project cluster and fix bug discovered by tests (elastic#131521) Unmutes elastic#132111, seems a transient, non reproducible issue (elastic#132253) Mute org.elasticsearch.search.suggest.phrase.PhraseSuggesterIT testPhraseSuggestionWithNgramOnlyAnalyzerThrowsException elastic#132347 Add AI21 support to Inference Plugin (elastic#131238) OpenJDK EA builds should use https instead of http (elastic#132297) ESQL: Normalize timeseries aggs slightly (elastic#132284) Avoid internal server error on suggester ngram bad request (elastic#132321) [ES|QL] Rerank operator improvements (elastic#132318) Mute org.elasticsearch.xpack.logsdb.qa.LogsDbVersusReindexedLogsDbChallengeRestIT testTermsQuery elastic#132337 ...
This PR introduces several enhancements to ES|QL's
RERANK
command.RERANK
Input Validation:On multiple fields, the whole content is encoded in YAML so it is not necessary
AnalyzerTests
for supported / unsupported field typesSparse Data Handling:
RERANK
operator to correctly handle null or missing values in input fieldnull
(0 does not make sense in the context of reranker model since the min score can be < 0).XContentRowEncoder
(in charge of the YAML conversion when multiple fields are used), so it returnsnull
if all fields arenull
(empty YAML before)Bug Fixes & Testing:
XContentRowEncoder
that caused a leading space in the outputXContentRowEncoderTests
) has been added to cover the functionality of theXContentRowEncoder
and prevent future regressionsRERANK
andCOMPLETION
have been updated to use a new test helper for reading block data and to assert correct behavior with sparse inputs.