Skip to content

[ML] Add internal action to return the Rerank window size #132169

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

davidkyle
Copy link
Member

The internal action is given an inference Id and returns the max number of words for a rerank request. Initially either 250 or 500 words is returned but the logic can be enhanced and tailored for each inference service.

A new RerankingInferenceService interface is defined to expose the window size, all services that support rerank must implement this interface. To check that this is the case all inference service unit tests now extend InferenceServiceTestCase and there is a check that if the service supports the RERANK task type then is must also implement RerankingInferenceService

# Conflicts:
#	server/src/main/java/org/elasticsearch/inference/InferenceService.java
# Conflicts:
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/sagemaker/SageMakerServiceTests.java
@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label Jul 30, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@Override
public int rerankerWindowSize(String modelId) {
// TODO rerank chunking should use the same value
return RerankingInferenceService.CONSERVATIVE_DEFAULT_WINDOW_SIZE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an accurate value for the elastic reranker? I believe it has 512 max token count which is ~683 words assuming 0.75 tokens/word for English text.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, also when I tested snippet extraction using the highlighter the sweet spot was around 2560 characters. I worry this might be too low.

/**
* The default window size for small reranking models.
*/
int CONSERVATIVE_DEFAULT_WINDOW_SIZE = 250;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we're using these defaults everywhere, even in services where this may be below the token limit (ex. Cohere uses the large default but truncates at well above this value assuming ~0.75 tokens/word in english). What is the impact of selecting a value that is too low/high? Do we plan to update each service with a service specific value at some point? Can you clarify why 250/500 is a good default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added bespoke settings for the individual services with comments explaining the choices. Happy to revise these

import static org.hamcrest.Matchers.containsString;

@ESTestCase.WithoutEntitlements // due to dependency issue ES-12435
public class RerankWindowSizeIT extends ESIntegTestCase {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we able to add tests for the cases when the service does not support rerank or the service for the endpoint could not be found?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's tricky because it is impossible to create a rerank endpoint for a service that does not support rerank nor can we create an endpoint that does not have a service. There isn't really a way of testing it as I cannot create the error condition. In theory these scenarios should never happen but the conditions are checked anyway and if for some reason it ever does happen the user will get an error message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I wasn't confident that we could mock these situations given that they shouldn't come up. Thanks for clarifying. I think having the errors is good even if we aren't able to make an automated test for them.

import static org.hamcrest.Matchers.containsString;

@ESTestCase.WithoutEntitlements // due to dependency issue ES-12435
public class RerankWindowSizeIT extends ESIntegTestCase {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I wasn't confident that we could mock these situations given that they shouldn't come up. Thanks for clarifying. I think having the errors is good even if we aren't able to make an automated test for them.

Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this API so quickly! I have some questions/concerns about the defaults.

@@ -191,6 +192,11 @@ protected ServiceSettings getServiceSettingsFromMap(Map<String, Object> serviceS
return TestServiceSettings.fromMap(serviceSettingsMap);
}

@Override
public int rerankerWindowSize(String modelId) {
return 333;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick - this could be parameterized and used in tests instead of hardcoding the number everywhere

// Alibaba's mGTE models support long context windows of up to 8192 tokens.
// Using 1 token = 0.75 words, this translates to approximately 6144 words.
// https://huggingface.co/Alibaba-NLP/gte-multilingual-reranker-base
return 5000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we set this so much lower than the actual token size? Is it a safety concern?

@Override
public int rerankerWindowSize(String modelId) {
// TODO rerank chunking should use the same value
return RerankingInferenceService.CONSERVATIVE_DEFAULT_WINDOW_SIZE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, also when I tested snippet extraction using the highlighter the sweet spot was around 2560 characters. I worry this might be too low.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml/Chunking :ml Machine learning >refactoring Team:ML Meta label for the ML team v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants