Skip to content

Add NVIDIA TensorRT-LLM optimization guide for GPT-OSS models #1983

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 5, 2025

Conversation

jayrodge
Copy link
Contributor

@jayrodge jayrodge commented Aug 5, 2025

Summary

Adds a comprehensive guide for optimizing OpenAI GPT-OSS models using NVIDIA TensorRT-LLM.

Changes

  • Add detailed guide for optimizing gpt-oss-20b and gpt-oss-120b models
  • Include hardware prerequisites (16GB+ VRAM, recommended GPUs)
  • Provide installation instructions for TensorRT-LLM via NGC and Docker
  • Add Python API examples for model loading and inference
  • Include performance optimization tips and next steps

Benefits

  • Helps users optimize GPT-OSS models for high-performance inference
  • Provides clear hardware requirements and setup instructions
  • Includes practical code examples for immediate use

@pap-openai pap-openai merged commit 3d32e44 into openai:main Aug 5, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants