Create Configuration#

Create a new deployment configuration for a NIM microservice you want to deploy.

Prerequisites#

Before you can create a NIM deployment configuration, make sure that you have:

Access to the NeMo Deployment Management service through the NeMo platform host if you have installed the NeMo platform or the independent base URL if you have installed the service individually. Store the base URL in an environment variable DEPLOYMENT_BASE_URL.
Model details and deployment specifications you want to deploy. To find the models supported by NVIDIA NIM, see Models in the NVIDIA NIM for LLMs documentation.

To Create a Configuration#

Choose one of the following options of creating a configuration.

Python SDK

from nemo_microservices import NeMoMicroservices

client = NeMoMicroservices(
    base_url=os.environ["DEPLOYMENT_MANAGEMENT_BASE_URL"],
    inference_base_url=os.environ["NIM_PROXY_BASE_URL"]
)

# For NVIDIA NGC Models

response = client.deployment.configs.create(
    name="your-custom-config",
    namespace="your-namespace",
    description="Custom configuration for NIM deployment",
    model="meta/llama-3.1-8b-instruct",
    nim_deployment={
        "image_name": "string",
        "image_tag": "string",
        "gpu": 0,
        "additional_envs": {
            "additionalProp1": "string",
            "additionalProp2": "string",
            "additionalProp3": "string"
        },
        "namespace": "string"
    },
    project="your-project",
)
print(response)

# For External Models such as OpenAI ChatGPT and build.nvidia.com

response = client.deployment.configs.create(
    name="your-custom-config",
    namespace="your-namespace",
    description="External endpoint configuration",
    external_endpoint={
        "host_url": "https://example.com/",
        "api_key": "string",
        "enabled_models": [
            "meta/llama-3.1-8b-instruct"
        ]
    },
    project="your-project",
)
print(response)

cURL

Make a POST request to the /v1/deployment/configs endpoint.

For more details on the request body, see the Deployment Management API reference.

For NVIDIA NGC Models

curl -X POST \
  "${DEPLOYMENT_BASE_URL}/v1/deployment/configs" \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "string",
    "namespace": "string",
    "description": "string",
    "model": "string",
    "nim_deployment": {
      "image_name": "string",
      "image_tag": "string",
      "gpu": 0,
      "additional_envs": {
        "additionalProp1": "string",
        "additionalProp2": "string",
        "additionalProp3": "string"
      },
      "namespace": "string"
    },
    "project": "string",
    "custom_fields": {},
    "ownership": {
      "created_by": "",
      "access_policies": {}
    }
  }' | jq

For External Models such as OpenAI ChatGPT and build.nvidia.com

curl -X POST \
  "${DEPLOYMENT_BASE_URL}/v1/deployment/configs" \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "string",
    "namespace": "string",
    "description": "string",
    "model": "string",
    "external_endpoint": {
      "host_url": "https://example.com/",
      "api_key": "string",
      "enabled_models": [
        "string"
      ]
    },
    "project": "string",
    "custom_fields": {},
    "ownership": {
      "created_by": "",
      "access_policies": {}
    }
  }' | jq

Tip

The configuration is created immediately and can be used for deployments right away.