NVIDIA Docs Hub Homepage NVIDIA Optimized Frameworks NVIDIA Optimized Frameworks Running

Running

Before you begin

Before you can run an NGC deep learning framework container, your Docker^® environment must support NVIDIA GPUs. To run a container, issue the appropriate command as explained in Running A Container and specify the registry, repository, and tags.

About this task

To run a container, issue the appropriate command as explained in the Running A Container chapter in the NVIDIA Containers For Deep Learning Frameworks User’s Guide and specify the registry, repository, and tags. For more information about using NGC, refer to the NGC Container User Guide.

If you have Docker 19.03 or later, a typical command to launch the container is:

Copy
Copied!

            
            
                                 docker run --gpus all -it --rm nvcr.io/nvidia/slgang:xx.yy-py3

If you have Docker 19.02 or later, a typical command to launch the container is:

Copy
Copied!

            
            
                                 nvidia-docker run -it --rm -v nvcr.io/nvidia/slang:xx.yy-py3

Where:

- xx.yy is the container version.

SGLang can be deployed in a client–server configuration. Start the HTTP inference server inside the container:

Copy
Copied!

            
            
                                 python3 -m sglang.launch_server --model-path nvidia/Llama-3.1-8B-Instruct-FP4 --host 0.0.0.0 --port 30000 --trust-remote-code --tp 1 --quantization modelopt_fp4 &

From a client, issue a text-generation request by POST-ing to /generate with a JSON body containing the prompt and sampling parameters:

Copy
Copied!

            
            
                                 curl http://localhost:30000/v1/chat/completions \
      -H "Content-Type: application/json" \
      -d '{ "model": "nvidia/Llama-3.1-8B-Instruct-FP4", "messages": [{"role": "user", "content": "What is NVIDIA famous for?"}], "max_tokens": 1000 }'

See /workspace/README.md inside the container for information on getting started and customizing your SGang image.

You might want to pull in data and model descriptions from locations outside the container for use by SGLang. To accomplish this, the easiest method is to mount one or more host directories as Docker bind mounts. For example:

Copy
Copied!

            
            
                                 docker run --gpus all -it --rm -v local_dir:container_dir nvcr.io/nvidia/sglang:xx.xx-py3