Skip to content

Handling of TPM limit errors for Azure (x-rate-limit-reset-tokens) #937

@plamzd

Description

@plamzd

Confirm this is an issue with the Python library and not an underlying OpenAI API

  • This is an issue with the Python library

Describe the bug

We've been noticing an increasing number of TPM limit errors when calling an Azure-hosted model via the library. We have a couple of retries configured but these do not help. The reason seems to be that recently the Azure API stopped returning the Retry-After header in case of limit errors and now return x-rate-limit-reset-tokens. The library currently only knows how to handle Retry-After .

To Reproduce

  • Force a token limit error on an Azure hosted model
  • Observe the response headers. Example
[2023-12-03 18:48:27.180] DEBUG worker_pool_8 [httpcore.http11.trace:45] receive_response_headers.complete return_value=(b'HTTP/1.1', 429, b'Too Many Requests', [(b'Content-Length', b'329'), (b'Content-Type', b'application/json'), (b'x-rate-limit-reset-tokens', b'55'), (b'apim-request-id', b'<uuid>'), (b'Strict-Transport-Security', b'max-age=31536000; includeSubDomains; preload'), (b'x-content-type-options', b'nosniff'), (b'policy-id', b'DeploymentRatelimit-Token'), (b'x-ms-region', b'West US'), (b'x-ratelimit-remaining-requests', b'52'), (b'Date', b'Sun, 03 Dec 2023 18:48:27 GMT')])

The retry-after header is no longer there and instead the x-rate-limit-reset-tokens is returned.

Code snippets

No response

OS

macOS

Python version

Python 3.12

Library version

openai 1.3.6

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions