NIMs do not impose rate limits. If you want to restrict access to your application, it is your responsibility to implement a strategy. One strategy is to use the NIM_TRITION_MAX_QUEUE_SIZE option to set Triton’s dynamic batching maximum queue size.

The NIM uses multiple ports, but only the HTTP API Port needs to be accessible outside of the cluster. The service port is set at start up based on the NIM_HTTP_API_PORT environment variable (default: 8000).

As a Developer, you must secure your own API endpoints. We suggest using a proxy as well as HTTPS/TLS 1.2.

If you deploy a NeMo Retriever Embedding NIM component using Helm charts, you will need at least two secrets to set up in that namespace:

An image pull secret for NGC

An NGC API key secret

If your cluster requires additional image pull secrets for custom init containers, you’ll need those also.

Create the secrets according to your organization’s requirements and Kubernetes secrets best practices, but for POCs and to set up secrets quickly you can use the following, where NAMESPACE is the name of your namespace:

kubectl \ --create secret -n NAMESPACE docker-registry ngc-secret \ --docker-server = nvcr.io --docker-username = '$oauthtoken' \ --docker-password = $NGC_API_KEY kubectl \ --create secret -n NAMESPACE generic ngc-api-secret \ --from-literal = NGC_API_KEY = $NGC_API_KEY

NeMo Retriever uses connection strings that may contain credentials. We recommend that you store these connection string in a secret management solution. For more information, refer to Manage Secrets.