SGLang Deployment#
SGLang is a serving framework for large language models. This deployment type launches SGLang servers using the lmsysorg/sglang
Docker image.
Configuration#
Required Settings#
See the complete configuration structure in configs/deployment/sglang.yaml
.
deployment:
type: sglang
image: lmsysorg/sglang:latest
checkpoint_path: /path/to/model # Path to model (local or HuggingFace model ID)
served_model_name: your-model-name
port: 8000
Required Fields:
checkpoint_path
: Model path or HuggingFace model ID (e.g.,meta-llama/Llama-3.1-8B-Instruct
)served_model_name
: Name for the served model
Optional Settings#
deployment:
tensor_parallel_size: 8 # Default: 8
data_parallel_size: 1 # Default: 1
extra_args: "" # Extra SGLang server arguments
env_vars: {} # Environment variables (key: value dict)
Configuration Fields:
tensor_parallel_size
: Number of GPUs for tensor parallelism (default: 8)data_parallel_size
: Number of data parallel replicas (default: 1)extra_args
: Extra command-line arguments to pass to SGLang serverenv_vars
: Environment variables for the container
API Endpoints#
The SGLang deployment exposes OpenAI-compatible endpoints:
endpoints:
chat: /v1/chat/completions
completions: /v1/completions
health: /health
Example Configuration#
defaults:
- execution: slurm/default
- deployment: sglang
- _self_
deployment:
checkpoint_path: meta-llama/Llama-3.1-8B-Instruct
served_model_name: llama-3.1-8b-instruct
tensor_parallel_size: 4
data_parallel_size: 1
extra_args: ""
env_vars:
HF_HOME: "/cache/huggingface" # make sure you have access to GPQA-Diamond and meta-llama/Llama-3.1-8B-Instruct
execution:
account: your-account
output_dir: /path/to/output
walltime: 02:00:00
evaluation:
tasks:
- name: gpqa_diamond
- name: ifeval
env_vars:
HF_TOKEN: HF_TOKEN_FOR_GPQA_DIAMOND # or use HF_HOME
Command Template#
The launcher uses the following command template to start the SGLang server (from configs/deployment/sglang.yaml
):
python3 -m sglang.launch_server \
--model-path ${oc.select:deployment.hf_model_handle,/checkpoint} \
--host 0.0.0.0 \
--port ${deployment.port} \
--served-model-name ${deployment.served_model_name} \
--tp ${deployment.tensor_parallel_size} \
--dp ${deployment.data_parallel_size} \
${deployment.extra_args}
Note
The ${oc.select:deployment.hf_model_handle,/checkpoint}
syntax uses OmegaConf’s select resolver. In practice, set checkpoint_path
with your model path or HuggingFace model ID.
Reference#
Configuration File:
Source:
packages/nemo-evaluator-launcher/src/nemo_evaluator_launcher/configs/deployment/sglang.yaml
Related Documentation: