Running
Before you can run an NGC deep learning framework container, your Docker® environment must support NVIDIA GPUs. To run a container, issue the appropriate command as explained in Running A Container and specify the registry, repository, and tags.
About this task
To run a container, issue the appropriate command as explained in the Running A Container chapter in the NVIDIA Containers For Deep Learning Frameworks User’s Guide and specify the registry, repository, and tags. For more information about using NGC, refer to the NGC Container User Guide.
If you have Docker 19.03 or later, a typical command to launch the container is:
If you have Docker 19.02 or later, a typical command to launch the container is:
Where:
- xx.yy is the container version.
SGLang can be deployed in a client–server configuration. Start the HTTP inference server inside the container:
From a client, issue a text-generation request by POST-ing to /generate with a JSON body containing the prompt and sampling parameters:
See /workspace/README.md inside the container for information on getting started and customizing your SGang image.
You might want to pull in data and model descriptions from locations outside the container for use by SGLang. To accomplish this, the easiest method is to mount one or more host directories as Docker bind mounts. For example: