Skip to content

Optimize the batching logic of the embeddings #27

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: pypi/0.0.0-alpha
Choose a base branch
from

Conversation

Warmybloodwolf
Copy link

Trying to fix the error "beyond max_tokens_per_request" as follows:

When utilizing the text-embedding-ada-002 model as the embedding model, the following issues were encountered during the initialization of the knowledge base:

Failed to generate Embedding: Error code: 400 - {'error': {'message': 'Requested 453357 tokens, max 300000 tokens per request', 'type': 'max_tokens_per_request', 'param': None, 'code': 'max_tokens_per_request'}}

This occurred because all three embedding models provided by OpenAI (Text-Embedding-3-small, Text-Embedding-3-large, and Text-Embedding-Ada-002) are subject to two types of limitations: an MAX INPUT limit of 8192 tokens and a max_tokens_per_request limit of 300,000 tokens. When embedding the knowledge base, the total number of tokens exceeded the latter limit, resulting in the aforementioned error.

This error was not observed during the previous development phase, likely due to the use of alternative embedding models.

The solution implemented involves estimating the total token consumption and processing the data in batches when the max_tokens_per_request limit is exceeded.

Although this issue may not arise when using other embedding models, the proposed solution does not interfere with the operation of other models. Therefore, no logical distinctions based on the embedding models were introduced.

Used to fix the error "beyond max_tokens_per_request"

Estimate the total tokens consumed and process them in batches when they exceed max_tokens_per_request
@code4DB
Copy link
Collaborator

code4DB commented Jun 13, 2025

@Warmybloodwolf Thanks for your efforts!

The documents to be embedded should be well-processed in advance to fit the token limits of the embedding model. That is to say, the processing logic should be placed outside of the embedding model. Thus, this PR might be retained to be merge into the main branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants