GPT-OSS Models#

This page provides detailed technical specifications for the OpenAI GPT-OSS model family supported by NeMo Customizer. For supported features and capabilities, refer to Tested Models.

Before You Start#

Ensure that hfTargetDownload is enabled in the Helm configuration and that a Hugging Face API key secret is available. Refer to the Hugging Face API Key Secret Guide for setup instructions.


GPT-OSS 20B#

Property

Value

Creator

OpenAI

Architecture

Mixture of Experts (MoE) Transformer

Description

GPT-OSS 20B provides lower latency for local or specialized use cases, featuring full chain-of-thought reasoning and agentic capabilities.

Max I/O Tokens

Not specified

Parameters

21B parameters (3.6B active parameters)

Training Data

Trained on harmony response format

Memory Requirements

Runs within 16GB of memory with MXFP4 quantization

Recommended GPU Count for Customization

8

Default Name

openai/gpt-oss-20b

HF Model URI

hf://openai/gpt-oss-20b

Training Options (20B)#

  • All Weights

  • Sequence Packing: Not supported

Default training max sequence length: 2048.

GPT-OSS 120B#

Property

Value

Creator

OpenAI

Architecture

Mixture of Experts (MoE) Transformer

Description

GPT-OSS 120B supports production, general purpose, and high reasoning use cases with advanced generation capabilities at scale.

Max I/O Tokens

Not specified

Parameters

117B parameters (5.1B active parameters)

Training Data

Trained on harmony response format

Memory Requirements

Fits on a single 80GB GPU (NVIDIA H100 or AMD MI300X) with MXFP4 quantization

Recommended GPU Count for Customization

Single H100 node

Default Name

openai/gpt-oss-120b

HF Model URI

hf://openai/gpt-oss-120b

Training Options (120B)#

  • All Weights

  • Sequence Packing: Not supported

Usage Recommendations#

Reasoning Levels#

Both GPT-OSS models support configurable reasoning levels that you can set in system prompts:

  • Low: Fast responses for general dialogue

  • Medium: Balanced speed and detail

  • High: Deep and detailed analysis

Example: Set reasoning level using “Reasoning: high” in the system prompt.

Model Selection Guidelines#

  • GPT-OSS 20B: Ideal for lower latency requirements, local deployment, specialized use cases, and consumer hardware (with Ollama support)

  • GPT-OSS 120B: Best for production environments, complex reasoning tasks, and scenarios requiring full capability within single-GPU constraints

Important Usage Note#

Both models use the harmony response format and require this format for proper functionality.

Note

Sequence packing and LoRA is not supported for GPT-OSS models in NeMo Customizer.