SGLang Deployment#
SGLang is a serving framework for large language models. This deployment type launches SGLang servers using the lmsysorg/sglang Docker image.
Configuration#
Required Settings#
See the complete configuration structure in configs/deployment/sglang.yaml.
deployment:
type: sglang
image: lmsysorg/sglang:latest
hf_model_handle: hf-model/handle # HuggingFace ID
checkpoint_path: null # or provide a path to the stored checkpoint
served_model_name: your-model-name
port: 8000
Required Fields:
checkpoint_pathorhf_model_handle: Model path or HuggingFace model ID (e.g.,meta-llama/Llama-3.1-8B-Instruct)served_model_name: Name for the served model
Optional Settings#
deployment:
tensor_parallel_size: 8 # Default: 8
data_parallel_size: 1 # Default: 1
extra_args: "" # Extra SGLang server arguments
env_vars: {} # Environment variables (key: value dict)
Configuration Fields:
tensor_parallel_size: Number of GPUs for tensor parallelism (default: 8)data_parallel_size: Number of data parallel replicas (default: 1)extra_args: Extra command-line arguments to pass to SGLang serverenv_vars: Environment variables for the container
API Endpoints#
The SGLang deployment exposes OpenAI-compatible endpoints:
endpoints:
chat: /v1/chat/completions
completions: /v1/completions
health: /health
Example Configuration#
defaults:
- execution: slurm/default
- deployment: sglang
- _self_
deployment:
checkpoint_path: Qwen/Qwen3-4B-Instruct-2507
served_model_name: qwen3-4b-instruct-2507
tensor_parallel_size: 8
data_parallel_size: 8
extra_args: ""
execution:
hostname: your-cluster-headnode
account: your-account
output_dir: /path/to/output
walltime: 02:00:00
evaluation:
tasks:
- name: gpqa_diamond
- name: ifeval
env_vars:
HF_TOKEN: HF_TOKEN_FOR_GPQA_DIAMOND # or use HF_HOME
Command Template#
The launcher uses the following command template to start the SGLang server (from configs/deployment/sglang.yaml):
python3 -m sglang.launch_server \
--model-path ${oc.select:deployment.hf_model_handle,/checkpoint} \
--host 0.0.0.0 \
--port ${deployment.port} \
--served-model-name ${deployment.served_model_name} \
--tp ${deployment.tensor_parallel_size} \
--dp ${deployment.data_parallel_size} \
${deployment.extra_args}
Note
The ${oc.select:deployment.hf_model_handle,/checkpoint} syntax uses OmegaConf’s select resolver. In practice, set checkpoint_path with your model path or HuggingFace model ID.
Reference#
Configuration File:
Source:
packages/nemo-evaluator-launcher/src/nemo_evaluator_launcher/configs/deployment/sglang.yaml
Related Documentation: