-
Notifications
You must be signed in to change notification settings - Fork 25.4k
[ML] Add internal action to return the Rerank window size #132169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
# Conflicts: # server/src/main/java/org/elasticsearch/inference/InferenceService.java
# Conflicts: # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/sagemaker/SageMakerServiceTests.java
Pinging @elastic/ml-core (Team:ML) |
...ugin/core/src/main/java/org/elasticsearch/xpack/core/inference/action/GetRerankerAction.java
Outdated
Show resolved
Hide resolved
...rence/src/main/java/org/elasticsearch/xpack/inference/action/TransportGetRerankerAction.java
Outdated
Show resolved
Hide resolved
...rence/src/main/java/org/elasticsearch/xpack/inference/action/TransportGetRerankerAction.java
Outdated
Show resolved
Hide resolved
...rence/src/main/java/org/elasticsearch/xpack/inference/action/TransportGetRerankerAction.java
Outdated
Show resolved
Hide resolved
@Override | ||
public int rerankerWindowSize(String modelId) { | ||
// TODO rerank chunking should use the same value | ||
return RerankingInferenceService.CONSERVATIVE_DEFAULT_WINDOW_SIZE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an accurate value for the elastic reranker? I believe it has 512 max token count which is ~683 words assuming 0.75 tokens/word for English text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, also when I tested snippet extraction using the highlighter the sweet spot was around 2560 characters. I worry this might be too low.
/** | ||
* The default window size for small reranking models. | ||
*/ | ||
int CONSERVATIVE_DEFAULT_WINDOW_SIZE = 250; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we're using these defaults everywhere, even in services where this may be below the token limit (ex. Cohere uses the large default but truncates at well above this value assuming ~0.75 tokens/word in english). What is the impact of selecting a value that is too low/high? Do we plan to update each service with a service specific value at some point? Can you clarify why 250/500 is a good default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added bespoke settings for the individual services with comments explaining the choices. Happy to revise these
...st/java/org/elasticsearch/xpack/inference/services/elastic/ElasticInferenceServiceTests.java
Outdated
Show resolved
Hide resolved
import static org.hamcrest.Matchers.containsString; | ||
|
||
@ESTestCase.WithoutEntitlements // due to dependency issue ES-12435 | ||
public class RerankWindowSizeIT extends ESIntegTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we able to add tests for the cases when the service does not support rerank or the service for the endpoint could not be found?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's tricky because it is impossible to create a rerank endpoint for a service that does not support rerank nor can we create an endpoint that does not have a service. There isn't really a way of testing it as I cannot create the error condition. In theory these scenarios should never happen but the conditions are checked anyway and if for some reason it ever does happen the user will get an error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I wasn't confident that we could mock these situations given that they shouldn't come up. Thanks for clarifying. I think having the errors is good even if we aren't able to make an automated test for them.
import static org.hamcrest.Matchers.containsString; | ||
|
||
@ESTestCase.WithoutEntitlements // due to dependency issue ES-12435 | ||
public class RerankWindowSizeIT extends ESIntegTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I wasn't confident that we could mock these situations given that they shouldn't come up. Thanks for clarifying. I think having the errors is good even if we aren't able to make an automated test for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this API so quickly! I have some questions/concerns about the defaults.
@@ -191,6 +192,11 @@ protected ServiceSettings getServiceSettingsFromMap(Map<String, Object> serviceS | |||
return TestServiceSettings.fromMap(serviceSettingsMap); | |||
} | |||
|
|||
@Override | |||
public int rerankerWindowSize(String modelId) { | |||
return 333; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick - this could be parameterized and used in tests instead of hardcoding the number everywhere
// Alibaba's mGTE models support long context windows of up to 8192 tokens. | ||
// Using 1 token = 0.75 words, this translates to approximately 6144 words. | ||
// https://huggingface.co/Alibaba-NLP/gte-multilingual-reranker-base | ||
return 5000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we set this so much lower than the actual token size? Is it a safety concern?
@Override | ||
public int rerankerWindowSize(String modelId) { | ||
// TODO rerank chunking should use the same value | ||
return RerankingInferenceService.CONSERVATIVE_DEFAULT_WINDOW_SIZE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, also when I tested snippet extraction using the highlighter the sweet spot was around 2560 characters. I worry this might be too low.
The internal action is given an inference Id and returns the max number of words for a rerank request. Initially either
250
or500
words is returned but the logic can be enhanced and tailored for each inference service.A new
RerankingInferenceService
interface is defined to expose the window size, all services that support rerank must implement this interface. To check that this is the case all inference service unit tests now extendInferenceServiceTestCase
and there is a check that if the service supports the RERANK task type then is must also implementRerankingInferenceService