Optimize the batching logic of the embeddings #27

Warmybloodwolf · 2025-06-13T11:14:01Z

Trying to fix the error "beyond max_tokens_per_request" as follows:

When utilizing the text-embedding-ada-002 model as the embedding model, the following issues were encountered during the initialization of the knowledge base:

Failed to generate Embedding: Error code: 400 - {'error': {'message': 'Requested 453357 tokens, max 300000 tokens per request', 'type': 'max_tokens_per_request', 'param': None, 'code': 'max_tokens_per_request'}}

This occurred because all three embedding models provided by OpenAI (Text-Embedding-3-small, Text-Embedding-3-large, and Text-Embedding-Ada-002) are subject to two types of limitations: an MAX INPUT limit of 8192 tokens and a max_tokens_per_request limit of 300,000 tokens. When embedding the knowledge base, the total number of tokens exceeded the latter limit, resulting in the aforementioned error.

This error was not observed during the previous development phase, likely due to the use of alternative embedding models.

The solution implemented involves estimating the total token consumption and processing the data in batches when the max_tokens_per_request limit is exceeded.

Although this issue may not arise when using other embedding models, the proposed solution does not interfere with the operation of other models. Therefore, no logical distinctions based on the embedding models were introduced.

Used to fix the error "beyond max_tokens_per_request" Estimate the total tokens consumed and process them in batches when they exceed max_tokens_per_request

code4DB · 2025-06-13T13:20:39Z

@Warmybloodwolf Thanks for your efforts!

The documents to be embedded should be well-processed in advance to fit the token limits of the embedding model. That is to say, the processing logic should be placed outside of the embedding model. Thus, this PR might be retained to be merge into the main branch.

Optimize the batching logic of the embeddings

8d185c5

Used to fix the error "beyond max_tokens_per_request" Estimate the total tokens consumed and process them in batches when they exceed max_tokens_per_request

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize the batching logic of the embeddings #27

Optimize the batching logic of the embeddings #27

Uh oh!

Warmybloodwolf commented Jun 13, 2025

Uh oh!

code4DB commented Jun 13, 2025

Uh oh!

Uh oh!

Optimize the batching logic of the embeddings #27

Are you sure you want to change the base?

Optimize the batching logic of the embeddings #27

Uh oh!

Conversation

Warmybloodwolf commented Jun 13, 2025

Uh oh!

code4DB commented Jun 13, 2025

Uh oh!

Uh oh!