Skip to content

Restore model registry validation for the semantic text field #127285

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 29, 2025

Conversation

jimczi
Copy link
Contributor

@jimczi jimczi commented Apr 23, 2025

This PR restores the changes reverted in #127075 and fixed the bug that caused the test failures.
The resolved model is now only used for validation during the creation of the semantic text field
and the final value is set on the first ingestion to ensure consistency.
This change also sets the default for semantic_text field using dense vectors to BBQ.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Apr 23, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @jimczi, I've created a changelog YAML for you.

Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this PR together. I'm not sure I quite understand based on the commit history what was causing this bug - it looks like the changes after the revert were pretty minimal. Would it be possible to walk me through how this fixed the original test failures so I can better understand if we want to proactively add more tests? Thanks!

@jimczi
Copy link
Contributor Author

jimczi commented Apr 24, 2025

Would it be possible to walk me through how this fixed the original test failures so I can better understand if we want to proactively add more tests?

The issue comes up when an inference endpoint gets created after the semantic field that points to it. In that case, the model only gets resolved when the mapping is updated. What’s happening in this bug is that the first ingestion triggers that update. We get the model definition from the bulk request and try to do a dynamic mapping update.

Dynamic updates run on the master node. There, it parses the current mapping (which doesn’t have the model settings yet) and merges it with the update. But since the model was already resolved via the registry during the initial parse, the system sees the update as a no-op which isn’t allowed during bulk requests.

The fix is to skip setting the model settings in the mapping if the model was only resolved through the registry. We still use the resolved model to build sub-fields and check the settings, but we don’t actually include it in the field mapping just yet. Then, when the first ingestion happens, the model settings get properly added through that dynamic update.

Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look reasonable to me, and I haven't heard any feedback in the channel w.r.t. adding additional tests, so approving. Thanks for the fix!

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found the new test I was looking for. Small nit about the test location.

@jimczi
Copy link
Contributor Author

jimczi commented Apr 29, 2025

@elasticmachine run elasticsearch-ci/part-3

@jimczi
Copy link
Contributor Author

jimczi commented Apr 29, 2025

@elasticmachine run elasticsearch-ci

@jimczi jimczi merged commit 85d375c into elastic:main Apr 29, 2025
17 checks passed
@jimczi jimczi deleted the model_registry_revert_back branch April 29, 2025 09:36
jimczi added a commit to jimczi/elasticsearch that referenced this pull request Apr 30, 2025
This PR is a partial backport of elastic#127285 that fixes the validation of the inference
id when mappings are restored or dynamically updated.
This change doesn't include defaulting semantic text dense vector to BBQ since it requires
elastic#124581 to be backported first.
jimczi added a commit to jimczi/elasticsearch that referenced this pull request Apr 30, 2025
This PR is a partial backport of elastic#127285 that fixes the validation of the inference
id when mappings are restored or dynamically updated.
This change doesn't include defaulting semantic text dense vector to BBQ since it requires
elastic#124581 to be backported first.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Relevance/Search Catch all for Search Relevance Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants