GPT-OSS Models#
This page provides detailed technical specifications for the OpenAI GPT-OSS model family supported by NeMo Customizer. For supported features and capabilities, refer to Tested Models.
Before You Start#
Ensure that hfTargetDownload
is enabled in the Helm configuration and that a Hugging Face API key secret is available. Refer to the Hugging Face API Key Secret Guide for setup instructions.
GPT-OSS 20B#
Property |
Value |
---|---|
Creator |
OpenAI |
Architecture |
Mixture of Experts (MoE) Transformer |
Description |
GPT-OSS 20B provides lower latency for local or specialized use cases, featuring full chain-of-thought reasoning and agentic capabilities. |
Max I/O Tokens |
Not specified |
Parameters |
21B parameters (3.6B active parameters) |
Training Data |
Trained on harmony response format |
Memory Requirements |
Runs within 16GB of memory with MXFP4 quantization |
Recommended GPU Count for Customization |
8 |
Default Name |
openai/gpt-oss-20b |
HF Model URI |
|
Training Options (20B)#
All Weights
Sequence Packing: Not supported
Default training max sequence length: 2048.
GPT-OSS 120B#
Property |
Value |
---|---|
Creator |
OpenAI |
Architecture |
Mixture of Experts (MoE) Transformer |
Description |
GPT-OSS 120B supports production, general purpose, and high reasoning use cases with advanced generation capabilities at scale. |
Max I/O Tokens |
Not specified |
Parameters |
117B parameters (5.1B active parameters) |
Training Data |
Trained on harmony response format |
Memory Requirements |
Fits on a single 80GB GPU (NVIDIA H100 or AMD MI300X) with MXFP4 quantization |
Recommended GPU Count for Customization |
Single H100 node |
Default Name |
openai/gpt-oss-120b |
HF Model URI |
|
Training Options (120B)#
All Weights
Sequence Packing: Not supported
Usage Recommendations#
Reasoning Levels#
Both GPT-OSS models support configurable reasoning levels that you can set in system prompts:
Low: Fast responses for general dialogue
Medium: Balanced speed and detail
High: Deep and detailed analysis
Example: Set reasoning level using “Reasoning: high” in the system prompt.
Model Selection Guidelines#
GPT-OSS 20B: Ideal for lower latency requirements, local deployment, specialized use cases, and consumer hardware (with Ollama support)
GPT-OSS 120B: Best for production environments, complex reasoning tasks, and scenarios requiring full capability within single-GPU constraints
Important Usage Note#
Both models use the harmony response format and require this format for proper functionality.
Note
Sequence packing and LoRA is not supported for GPT-OSS models in NeMo Customizer.