-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Description
Confirm this is a feature request for the Python library and not the underlying OpenAI API.
- This is a feature request for the Python library
Describe the feature or improvement you're requesting
When making batch requests using LangChain
, with an OpenAI
model as shown in this minimal repro, it is common to hit the organizational rate limit for tokens per minute (TPM) - as demonstrated in this error log.
Whilst limiting the concurrency of batches, and introducing exponential backoff can be used to reduce this issue downstream in LangChain
- I believe there is also room for the OpenAI#request
function in this library to more intelligently handle parallel invocations so as to better support batch requests, regardless of whether this library, langchain
or another codebase is responsible for initiating the batch requests.
In particular, I would suggest that the SyncAPIClient
create queue(s) of requests and determine when enqueued requests can be run based on the x-ratelimit-*
and retry-after
headers of existing responses.
Additional context
Related to #937 (comment)