Skip to content

Add match support for semantic_text fields #117839

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

kderusso
Copy link
Member

@kderusso kderusso commented Dec 2, 2024

Followup on #112166

Implements match query support for semantic_text fields.

The following design changes were made from the original POC:

  • Replaces the idea of the query builder service by a query rewrite interceptor that can be defined by plugins.
  • match queries on combined text and semantic_text field will no longer error but will return results.

Example script of how to use it:

PUT _inference/sparse_embedding/my-elser-endpoint
{
  "service": "elser",
  "service_settings": {
    "num_allocations": 1,
    "num_threads": 1
  },
  "task_settings": {}
}

PUT my-index-1
{
    "mappings": {
        "properties": {
            "inference_field": {
                "type": "semantic_text",
                "inference_id": "my-elser-endpoint"
            },
            "text_field": {
                "type": "text"
            }
        }
    }
}

PUT my-index-2
{
    "mappings": {
        "properties": {
            "inference_field": {
                "type": "text"
            },
            "text_field": {
                "type": "text"
            }
        }
    }
}

POST my-index-1/_doc/1
{
  "inference_field": "a test value",
  "text_field": "a test value"
}

POST my-index-2/_doc/1
{
  "inference_field": "a test value",
  "text_field": "a test value"
}

GET my-index-1/_search
{
  "query": {
    "match": {
      "inference_field": "test"
    }
  }
}

GET my-index-1/_search
{
  "query": {
    "match": {
      "text_field": "test"
    }
  }
}

GET my-index-*/_search
{
  "query": {
    "match": {
      "inference_field": "test"
    }
  }
}

@kderusso kderusso force-pushed the kderusso/semantic-text-match-query-support branch from 34af576 to bcb93c3 Compare December 2, 2024 19:03
@kderusso kderusso changed the title WIP: Add match support for semantic_text fields Add match support for semantic_text fields Dec 3, 2024
@kderusso kderusso added >enhancement :SearchOrg/Relevance Label for the Search (solution/org) Relevance team labels Dec 3, 2024
Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! I left a few comments, mostly nits.

}

public QueryBuilder interceptAndRewrite(QueryRewriteContext context, QueryBuilder queryBuilder) {
assert (queryBuilder instanceof MatchQueryBuilder);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dumb question: how does this interceptor know to only intercept match queries?

Update: After reviewing the code further, I understand how this interceptor only intercepts match queries. However, this behavior is based on an implementation detail that I think could easily trip us up later. We use QueryRewriteInterceptor#multi to create an interceptor lookup map. However, QueryRewriteContext will accept any QueryRewriteInterceptor, including one not created by QueryRewriteInterceptor#multi. Could we update the classes and method signatures to to enforce that that the rewrite context should take a map of interceptors?

We don't need to do it in this PR, but I think it's something good for a follow-up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get it. I think that we certainly could do that, and I'd argue that if we decide to go down that route, maybe we get rid of the CompositeQueryRewriteInterceptor concept at all in favor of a service class that will do the right thing. I'll think about it, but I'd like to do that in a followup if we decide that's the best direction to go in, as it won't impact transport serialization or other features.

Comment on lines 59 to 65
for (String inferenceIndexName : inferenceIndices) {
// Add a separate clause for each semantic query, because they may be using different inference endpoints
// TODO - consolidate this to a single clause once the semantic query supports multiple inference endpoints
boolQueryBuilder.should(
createSemanticSubQuery(inferenceIndexName, matchQueryBuilder.fieldName(), (String) matchQueryBuilder.value())
);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Match query support with multiple inference endpoints is already broken (due to the index filter clause running as a post-filter), so we can consolidate to a single clause now.

Copy link
Member

@carlosdelest carlosdelest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing the rewriting test!

rewritten = matchQueryBuilder.rewrite(context);
assertTrue(rewritten instanceof InterceptedQueryBuilderWrapper);
assertTrue(((InterceptedQueryBuilderWrapper) rewritten).queryBuilder instanceof MatchQueryBuilder);
MatchQueryBuilder rewrittenMatchQueryBuilder = (MatchQueryBuilder) ((InterceptedQueryBuilderWrapper) rewritten).queryBuilder;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test! 💯

Should we test that an additional rewrite on an intercepted query returns the same query, so we know the rewrite cycle ends?

…textBuilder.java

Co-authored-by: Mike Pellegrini <mike.pellegrini@elastic.co>
@kderusso kderusso enabled auto-merge (squash) December 12, 2024 13:16
@kderusso kderusso merged commit c9a6a2c into elastic:main Dec 12, 2024
16 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 117839

@kderusso
Copy link
Member Author

💚 All backports created successfully

Status Branch Result
8.x

Questions ?

Please refer to the Backport tool documentation

kderusso added a commit to kderusso/elasticsearch that referenced this pull request Dec 12, 2024
* Added query name to inference field metadata

* Fix build error

* Added query builder service

* Add query builder service to query rewrite context

* Updated match query to support querying semantic text fields

* Fix build error

* Fix NPE

* Update the POC to rewrite to a bool query when combined inference and non-inference fields

* Separate clause for each inference index (to avoid inference ID clashes)

* Simplify query builder service concept to a single default inference query

* Rename QueryBuilderService, remove query name from inference metadata

* Fix too many rewrite rounds error by injecting booleans in constructors for match query builder and semantic text

* Fix test compilation errors

* Fix tests

* Add yaml test for semantic match

* Add NodeFeature

* Fix license headers

* Spotless

* Updated getClass comparison in MatchQueryBuilder

* Cleanup

* Add Mock Inference Query Builder Service

* Spotless

* Cleanup

* Update docs/changelog/117839.yaml

* Update changelog

* Replace the default inference query builder with a query rewrite interceptor

* Cleanup

* Some more cleanup/renames

* Some more cleanup/renames

* Spotless

* Checkstyle

* Convert List<QueryRewriteInterceptor> to Map keyed on query name, error on query name collisions

* PR feedback - remove check on QueryRewriteContext class only

* PR feedback

* Remove intercept flag from MatchQueryBuilder and replace with wrapper

* Move feature to test feature

* Ensure interception happens only once

* Rename InterceptedQueryBuilderWrapper to AbstractQueryBuilderWrapper

* Add lenient field to SemanticQueryBuilder

* Clean up yaml test

* Add TODO comment

* Add comment

* Spotless

* Rename AbstractQueryBuilderWrapper back to InterceptedQueryBuilderWrapper

* Spotless

* Didn't mean to commit that

* Remove static class wrapping the InterceptedQueryBuilderWrapper

* Make InterceptedQueryBuilderWrapper part of QueryRewriteInterceptor

* Refactor the interceptor to be an internal plugin that cannot be used outside inference plugin

* Fix tests

* Spotless

* Minor cleanup

* C'mon spotless

* Test spotless

* Cleanup InternalQueryRewriter

* Change if statement to assert

* Simplify template of InterceptedQueryBuilderWrapper

* Change constructor of InterceptedQueryBuilderWrapper

* Refactor InterceptedQueryBuilderWrapper to extend QueryBuilder

* Cleanup

* Add test

* Spotless

* Rename rewrite to interceptAndRewrite in QueryRewriteInterceptor

* DOESN'T WORK - for testing

* Add comment

* Getting closer - match on single typed fields works now

* Deleted line by mistake

* Checkstyle

* Fix over-aggressive IntelliJ Refactor/Rename

* And another one

* Move SemanticMatchQueryRewriteInterceptor.SEMANTIC_MATCH_QUERY_REWRITE_INTERCEPTION_SUPPORTED to Test feature

* PR feedback

* Require query name with no default

* PR feedback & update test

* Add rewrite test

* Update server/src/main/java/org/elasticsearch/index/query/InnerHitContextBuilder.java

Co-authored-by: Mike Pellegrini <mike.pellegrini@elastic.co>

---------

Co-authored-by: Mike Pellegrini <mike.pellegrini@elastic.co>
(cherry picked from commit c9a6a2c)
kderusso added a commit that referenced this pull request Dec 12, 2024
* Added query name to inference field metadata

* Fix build error

* Added query builder service

* Add query builder service to query rewrite context

* Updated match query to support querying semantic text fields

* Fix build error

* Fix NPE

* Update the POC to rewrite to a bool query when combined inference and non-inference fields

* Separate clause for each inference index (to avoid inference ID clashes)

* Simplify query builder service concept to a single default inference query

* Rename QueryBuilderService, remove query name from inference metadata

* Fix too many rewrite rounds error by injecting booleans in constructors for match query builder and semantic text

* Fix test compilation errors

* Fix tests

* Add yaml test for semantic match

* Add NodeFeature

* Fix license headers

* Spotless

* Updated getClass comparison in MatchQueryBuilder

* Cleanup

* Add Mock Inference Query Builder Service

* Spotless

* Cleanup

* Update docs/changelog/117839.yaml

* Update changelog

* Replace the default inference query builder with a query rewrite interceptor

* Cleanup

* Some more cleanup/renames

* Some more cleanup/renames

* Spotless

* Checkstyle

* Convert List<QueryRewriteInterceptor> to Map keyed on query name, error on query name collisions

* PR feedback - remove check on QueryRewriteContext class only

* PR feedback

* Remove intercept flag from MatchQueryBuilder and replace with wrapper

* Move feature to test feature

* Ensure interception happens only once

* Rename InterceptedQueryBuilderWrapper to AbstractQueryBuilderWrapper

* Add lenient field to SemanticQueryBuilder

* Clean up yaml test

* Add TODO comment

* Add comment

* Spotless

* Rename AbstractQueryBuilderWrapper back to InterceptedQueryBuilderWrapper

* Spotless

* Didn't mean to commit that

* Remove static class wrapping the InterceptedQueryBuilderWrapper

* Make InterceptedQueryBuilderWrapper part of QueryRewriteInterceptor

* Refactor the interceptor to be an internal plugin that cannot be used outside inference plugin

* Fix tests

* Spotless

* Minor cleanup

* C'mon spotless

* Test spotless

* Cleanup InternalQueryRewriter

* Change if statement to assert

* Simplify template of InterceptedQueryBuilderWrapper

* Change constructor of InterceptedQueryBuilderWrapper

* Refactor InterceptedQueryBuilderWrapper to extend QueryBuilder

* Cleanup

* Add test

* Spotless

* Rename rewrite to interceptAndRewrite in QueryRewriteInterceptor

* DOESN'T WORK - for testing

* Add comment

* Getting closer - match on single typed fields works now

* Deleted line by mistake

* Checkstyle

* Fix over-aggressive IntelliJ Refactor/Rename

* And another one

* Move SemanticMatchQueryRewriteInterceptor.SEMANTIC_MATCH_QUERY_REWRITE_INTERCEPTION_SUPPORTED to Test feature

* PR feedback

* Require query name with no default

* PR feedback & update test

* Add rewrite test

* Update server/src/main/java/org/elasticsearch/index/query/InnerHitContextBuilder.java

Co-authored-by: Mike Pellegrini <mike.pellegrini@elastic.co>

---------

Co-authored-by: Mike Pellegrini <mike.pellegrini@elastic.co>
(cherry picked from commit c9a6a2c)
maxhniebergall pushed a commit to maxhniebergall/elasticsearch that referenced this pull request Dec 16, 2024
…118587)

* Added query name to inference field metadata

* Fix build error

* Added query builder service

* Add query builder service to query rewrite context

* Updated match query to support querying semantic text fields

* Fix build error

* Fix NPE

* Update the POC to rewrite to a bool query when combined inference and non-inference fields

* Separate clause for each inference index (to avoid inference ID clashes)

* Simplify query builder service concept to a single default inference query

* Rename QueryBuilderService, remove query name from inference metadata

* Fix too many rewrite rounds error by injecting booleans in constructors for match query builder and semantic text

* Fix test compilation errors

* Fix tests

* Add yaml test for semantic match

* Add NodeFeature

* Fix license headers

* Spotless

* Updated getClass comparison in MatchQueryBuilder

* Cleanup

* Add Mock Inference Query Builder Service

* Spotless

* Cleanup

* Update docs/changelog/117839.yaml

* Update changelog

* Replace the default inference query builder with a query rewrite interceptor

* Cleanup

* Some more cleanup/renames

* Some more cleanup/renames

* Spotless

* Checkstyle

* Convert List<QueryRewriteInterceptor> to Map keyed on query name, error on query name collisions

* PR feedback - remove check on QueryRewriteContext class only

* PR feedback

* Remove intercept flag from MatchQueryBuilder and replace with wrapper

* Move feature to test feature

* Ensure interception happens only once

* Rename InterceptedQueryBuilderWrapper to AbstractQueryBuilderWrapper

* Add lenient field to SemanticQueryBuilder

* Clean up yaml test

* Add TODO comment

* Add comment

* Spotless

* Rename AbstractQueryBuilderWrapper back to InterceptedQueryBuilderWrapper

* Spotless

* Didn't mean to commit that

* Remove static class wrapping the InterceptedQueryBuilderWrapper

* Make InterceptedQueryBuilderWrapper part of QueryRewriteInterceptor

* Refactor the interceptor to be an internal plugin that cannot be used outside inference plugin

* Fix tests

* Spotless

* Minor cleanup

* C'mon spotless

* Test spotless

* Cleanup InternalQueryRewriter

* Change if statement to assert

* Simplify template of InterceptedQueryBuilderWrapper

* Change constructor of InterceptedQueryBuilderWrapper

* Refactor InterceptedQueryBuilderWrapper to extend QueryBuilder

* Cleanup

* Add test

* Spotless

* Rename rewrite to interceptAndRewrite in QueryRewriteInterceptor

* DOESN'T WORK - for testing

* Add comment

* Getting closer - match on single typed fields works now

* Deleted line by mistake

* Checkstyle

* Fix over-aggressive IntelliJ Refactor/Rename

* And another one

* Move SemanticMatchQueryRewriteInterceptor.SEMANTIC_MATCH_QUERY_REWRITE_INTERCEPTION_SUPPORTED to Test feature

* PR feedback

* Require query name with no default

* PR feedback & update test

* Add rewrite test

* Update server/src/main/java/org/elasticsearch/index/query/InnerHitContextBuilder.java

Co-authored-by: Mike Pellegrini <mike.pellegrini@elastic.co>

---------

Co-authored-by: Mike Pellegrini <mike.pellegrini@elastic.co>
(cherry picked from commit c9a6a2c)
maxhniebergall pushed a commit to maxhniebergall/elasticsearch that referenced this pull request Dec 16, 2024
…118587)

* Added query name to inference field metadata

* Fix build error

* Added query builder service

* Add query builder service to query rewrite context

* Updated match query to support querying semantic text fields

* Fix build error

* Fix NPE

* Update the POC to rewrite to a bool query when combined inference and non-inference fields

* Separate clause for each inference index (to avoid inference ID clashes)

* Simplify query builder service concept to a single default inference query

* Rename QueryBuilderService, remove query name from inference metadata

* Fix too many rewrite rounds error by injecting booleans in constructors for match query builder and semantic text

* Fix test compilation errors

* Fix tests

* Add yaml test for semantic match

* Add NodeFeature

* Fix license headers

* Spotless

* Updated getClass comparison in MatchQueryBuilder

* Cleanup

* Add Mock Inference Query Builder Service

* Spotless

* Cleanup

* Update docs/changelog/117839.yaml

* Update changelog

* Replace the default inference query builder with a query rewrite interceptor

* Cleanup

* Some more cleanup/renames

* Some more cleanup/renames

* Spotless

* Checkstyle

* Convert List<QueryRewriteInterceptor> to Map keyed on query name, error on query name collisions

* PR feedback - remove check on QueryRewriteContext class only

* PR feedback

* Remove intercept flag from MatchQueryBuilder and replace with wrapper

* Move feature to test feature

* Ensure interception happens only once

* Rename InterceptedQueryBuilderWrapper to AbstractQueryBuilderWrapper

* Add lenient field to SemanticQueryBuilder

* Clean up yaml test

* Add TODO comment

* Add comment

* Spotless

* Rename AbstractQueryBuilderWrapper back to InterceptedQueryBuilderWrapper

* Spotless

* Didn't mean to commit that

* Remove static class wrapping the InterceptedQueryBuilderWrapper

* Make InterceptedQueryBuilderWrapper part of QueryRewriteInterceptor

* Refactor the interceptor to be an internal plugin that cannot be used outside inference plugin

* Fix tests

* Spotless

* Minor cleanup

* C'mon spotless

* Test spotless

* Cleanup InternalQueryRewriter

* Change if statement to assert

* Simplify template of InterceptedQueryBuilderWrapper

* Change constructor of InterceptedQueryBuilderWrapper

* Refactor InterceptedQueryBuilderWrapper to extend QueryBuilder

* Cleanup

* Add test

* Spotless

* Rename rewrite to interceptAndRewrite in QueryRewriteInterceptor

* DOESN'T WORK - for testing

* Add comment

* Getting closer - match on single typed fields works now

* Deleted line by mistake

* Checkstyle

* Fix over-aggressive IntelliJ Refactor/Rename

* And another one

* Move SemanticMatchQueryRewriteInterceptor.SEMANTIC_MATCH_QUERY_REWRITE_INTERCEPTION_SUPPORTED to Test feature

* PR feedback

* Require query name with no default

* PR feedback & update test

* Add rewrite test

* Update server/src/main/java/org/elasticsearch/index/query/InnerHitContextBuilder.java

Co-authored-by: Mike Pellegrini <mike.pellegrini@elastic.co>

---------

Co-authored-by: Mike Pellegrini <mike.pellegrini@elastic.co>
(cherry picked from commit c9a6a2c)
@kderusso kderusso deleted the kderusso/semantic-text-match-query-support branch January 24, 2025 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged backport pending >enhancement :Search Relevance/Search Catch all for Search Relevance :SearchOrg/Relevance Label for the Search (solution/org) Relevance team v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants