[ML] Add internal action to return the Rerank window size #132169

davidkyle · 2025-07-30T10:43:46Z

The internal action is given an inference Id and returns the max number of words for a rerank request. Initially either 250 or 500 words is returned but the logic can be enhanced and tailored for each inference service.

A new RerankingInferenceService interface is defined to expose the window size, all services that support rerank must implement this interface. To check that this is the case all inference service unit tests now extend InferenceServiceTestCase and there is a check that if the service supports the RERANK task type then is must also implement RerankingInferenceService

# Conflicts: # server/src/main/java/org/elasticsearch/inference/InferenceService.java

# Conflicts: # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/sagemaker/SageMakerServiceTests.java

elasticsearchmachine · 2025-07-30T10:44:11Z

Pinging @elastic/ml-core (Team:ML)

...ugin/core/src/main/java/org/elasticsearch/xpack/core/inference/action/GetRerankerAction.java

...rence/src/main/java/org/elasticsearch/xpack/inference/action/TransportGetRerankerAction.java

dan-rubinstein · 2025-07-31T15:45:49Z

...a/org/elasticsearch/xpack/inference/services/elasticsearch/ElasticsearchInternalService.java

+    @Override
+    public int rerankerWindowSize(String modelId) {
+        // TODO rerank chunking should use the same value
+        return RerankingInferenceService.CONSERVATIVE_DEFAULT_WINDOW_SIZE;


Is this an accurate value for the elastic reranker? I believe it has 512 max token count which is ~683 words assuming 0.75 tokens/word for English text.

Correct, also when I tested snippet extraction using the highlighter the sweet spot was around 2560 characters. I worry this might be too low.

dan-rubinstein · 2025-07-31T15:46:57Z

server/src/main/java/org/elasticsearch/inference/RerankingInferenceService.java

+    /**
+     * The default window size for small reranking models.
+     */
+    int CONSERVATIVE_DEFAULT_WINDOW_SIZE = 250;


Looks like we're using these defaults everywhere, even in services where this may be below the token limit (ex. Cohere uses the large default but truncates at well above this value assuming ~0.75 tokens/word in english). What is the impact of selecting a value that is too low/high? Do we plan to update each service with a service specific value at some point? Can you clarify why 250/500 is a good default?

I added bespoke settings for the individual services with comments explaining the choices. Happy to revise these

...st/java/org/elasticsearch/xpack/inference/services/elastic/ElasticInferenceServiceTests.java

dan-rubinstein · 2025-07-31T17:35:13Z

...ternalClusterTest/java/org/elasticsearch/xpack/inference/integration/RerankWindowSizeIT.java

+import static org.hamcrest.Matchers.containsString;
+
+@ESTestCase.WithoutEntitlements // due to dependency issue ES-12435
+public class RerankWindowSizeIT extends ESIntegTestCase {


Are we able to add tests for the cases when the service does not support rerank or the service for the endpoint could not be found?

It's tricky because it is impossible to create a rerank endpoint for a service that does not support rerank nor can we create an endpoint that does not have a service. There isn't really a way of testing it as I cannot create the error condition. In theory these scenarios should never happen but the conditions are checked anyway and if for some reason it ever does happen the user will get an error message.

Makes sense. I wasn't confident that we could mock these situations given that they shouldn't come up. Thanks for clarifying. I think having the errors is good even if we aren't able to make an automated test for them.

dan-rubinstein · 2025-08-01T17:05:52Z

...ternalClusterTest/java/org/elasticsearch/xpack/inference/integration/RerankWindowSizeIT.java

+import static org.hamcrest.Matchers.containsString;
+
+@ESTestCase.WithoutEntitlements // due to dependency issue ES-12435
+public class RerankWindowSizeIT extends ESIntegTestCase {


Makes sense. I wasn't confident that we could mock these situations given that they shouldn't come up. Thanks for clarifying. I think having the errors is good even if we aren't able to make an automated test for them.

kderusso

Thanks for adding this API so quickly! I have some questions/concerns about the defaults.

kderusso · 2025-08-04T13:08:56Z

...ugin/src/main/java/org/elasticsearch/xpack/inference/mock/TestRerankingServiceExtension.java

@@ -191,6 +192,11 @@ protected ServiceSettings getServiceSettingsFromMap(Map<String, Object> serviceS
            return TestServiceSettings.fromMap(serviceSettingsMap);
        }

+        @Override
+        public int rerankerWindowSize(String modelId) {
+            return 333;


Nitpick - this could be parameterized and used in tests instead of hardcoding the number everywhere

kderusso · 2025-08-04T13:10:23Z

...org/elasticsearch/xpack/inference/services/alibabacloudsearch/AlibabaCloudSearchService.java

+        // Alibaba's mGTE models support long context windows of up to 8192 tokens.
+        // Using 1 token = 0.75 words, this translates to approximately 6144 words.
+        // https://huggingface.co/Alibaba-NLP/gte-multilingual-reranker-base
+        return 5000;


Why do we set this so much lower than the actual token size? Is it a safety concern?

kderusso · 2025-08-04T13:11:25Z

...a/org/elasticsearch/xpack/inference/services/elasticsearch/ElasticsearchInternalService.java

+    @Override
+    public int rerankerWindowSize(String modelId) {
+        // TODO rerank chunking should use the same value
+        return RerankingInferenceService.CONSERVATIVE_DEFAULT_WINDOW_SIZE;


Correct, also when I tested snippet extraction using the highlighter the sweet spot was around 2560 characters. I worry this might be too low.

davidkyle added 6 commits July 30, 2025 10:04

Add internal get reranker size action

b2d6dd2

test case

00c8286

wip reranker interface

3004d7c

# Conflicts: # server/src/main/java/org/elasticsearch/inference/InferenceService.java

Add rerank to the unit tests

d5ebe19

# Conflicts: # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/sagemaker/SageMakerServiceTests.java

Integ test

11e8aad

fix the tests

7950536

davidkyle added >refactoring :ml Machine learning v9.2.0 labels Jul 30, 2025

elasticsearchmachine added the Team:ML Meta label for the ML team label Jul 30, 2025

[CI] Auto commit changes from spotless

539f468

davidkyle added the :ml/Chunking label Jul 30, 2025

dan-rubinstein reviewed Jul 31, 2025

View reviewed changes

davidkyle added 3 commits August 1, 2025 11:57

address review comments

91646ed

per service logic

1fd8732

Merge branch 'main' into rerank-window-size-action

6d70792

dan-rubinstein approved these changes Aug 1, 2025

View reviewed changes

kderusso reviewed Aug 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] Add internal action to return the Rerank window size #132169

[ML] Add internal action to return the Rerank window size #132169

davidkyle commented Jul 30, 2025

Uh oh!

elasticsearchmachine commented Jul 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dan-rubinstein Jul 31, 2025

Uh oh!

kderusso Aug 4, 2025

Uh oh!

dan-rubinstein Jul 31, 2025

Uh oh!

davidkyle Aug 1, 2025

Uh oh!

Uh oh!

dan-rubinstein Jul 31, 2025

Uh oh!

davidkyle Aug 1, 2025

Uh oh!

dan-rubinstein Aug 1, 2025

Uh oh!

dan-rubinstein Aug 1, 2025

Uh oh!

kderusso left a comment

Uh oh!

kderusso Aug 4, 2025

Uh oh!

kderusso Aug 4, 2025

Uh oh!

kderusso Aug 4, 2025

Uh oh!

Uh oh!

[ML] Add internal action to return the Rerank window size #132169

Are you sure you want to change the base?

[ML] Add internal action to return the Rerank window size #132169

Conversation

davidkyle commented Jul 30, 2025

Uh oh!

elasticsearchmachine commented Jul 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kderusso left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!