Skip to content

feature request: proactive client-side rate limiting #1579

@jeswr

Description

@jeswr

Confirm this is a feature request for the Python library and not the underlying OpenAI API.

  • This is a feature request for the Python library

Describe the feature or improvement you're requesting

When making batch requests using LangChain, with an OpenAI model as shown in this minimal repro, it is common to hit the organizational rate limit for tokens per minute (TPM) - as demonstrated in this error log.

Whilst limiting the concurrency of batches, and introducing exponential backoff can be used to reduce this issue downstream in LangChain - I believe there is also room for the OpenAI#request function in this library to more intelligently handle parallel invocations so as to better support batch requests, regardless of whether this library, langchain or another codebase is responsible for initiating the batch requests.

In particular, I would suggest that the SyncAPIClient create queue(s) of requests and determine when enqueued requests can be run based on the x-ratelimit-* and retry-after headers of existing responses.

Additional context

Related to #937 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions