Deploy Boltz2 with Helm#

Boltz2 NIMs are designed to run on systems with NVIDIA GPUs. To deploy using Helm, you must have a Kubernetes cluster with appropriate GPU nodes and the GPU Operator installed.

Prerequisites#

If you have not set up your NGC API key, refer to Getting Started first.

Required Components#

  • Configured Kubernetes cluster with GPU nodes

  • NVIDIA GPU Operator installed

  • Helm 3.x or later

  • kubectl configured to access your cluster

  • NGC API key with access to Boltz2 NIM

Download the Helm Chart#

The Boltz2 NIM Helm chart is available from the NGC Catalog. Use the following command to download the chart:

helm fetch https://helm.ngc.nvidia.com/nim/mit/charts/boltz2-nim-<version>.tgz --username='$oauthtoken' --password=<YOUR API KEY>

Replace <version> with the desired chart version. In most cases, you should use the latest version available at: https://catalog.ngc.nvidia.com/orgs/nim/teams/mit/helm-charts/boltz2-nim/files

Configure Helm#

The following Helm options are the most important to configure for deploying Boltz2 NIM:

Core Configuration Options#

  • boltz2.repository – The Boltz2 NIM container to deploy

  • boltz2.tag – The version of the container

  • boltz2.servicePort – The port the Kubernetes service listens on (default: 8081)

  • boltz2.structureOptimizedBackend – Backend for structure prediction: “trt” (default) or “pytorch”

  • boltz2.affinityOptimizedBackend – Backend for affinity binding: “trt” (default) or “pytorch”

  • boltz2.enableDiffusionTF32 – Enable TensorFloat-32: “0” or “1” (default: “1”)

  • Storage options – Based on your cluster environment (see Storage section)

  • ngc.apiSecret and imagePullSecrets – For NGC authentication

  • resources – GPU and memory requirements

To view all available configuration options, run:

helm show readme boltz2-nim-<version>.tgz | less

To examine default values:

helm show values boltz2-nim-<version>.tgz

Quick Start Deployment#

1. Create NGC Secrets#

Create the necessary Kubernetes secrets for NGC authentication:

# Image pull secret
kubectl create secret docker-registry ngc-secret \
  --docker-server=nvcr.io \
  --docker-username='$oauthtoken' \
  --docker-password=$NGC_API_KEY

# NGC API key secret
kubectl create secret generic ngc-api \
  --from-literal=NGC_API_KEY=$NGC_API_KEY

2. Create Values File#

Create a file named custom-values.yaml with your configuration:

boltz2:
  repository: "nvcr.io/nim/mit/boltz2"
  tag: "1.5.0"
  structureOptimizedBackend: "trt"
  affinityOptimizedBackend: "trt"
  enableDiffusionTF32: "1"

ngc:
  apiSecret: ngc-api

persistence:
  enabled: true
  size: 100Gi

resources:
  limits:
    nvidia.com/gpu: 1
    memory: 64Gi
  requests:
    memory: 32Gi

imagePullSecrets:
  - name: ngc-secret

3. Install the Chart#

Deploy Boltz2 NIM using Helm:

export CHART_NAME=boltz2-nim
helm install "${CHART_NAME}" boltz2-nim-<version>.tgz -f custom-values.yaml

4. Verify Deployment#

Check the pod status:

kubectl get pods

View logs to monitor startup progress:

kubectl logs -f <pod-name>

Note

Due to large model files, pods may take several minutes to start. The NIM will download model weights on first startup if they are not already present in the persistent volume.

Storage#

Proper storage configuration is critical for Boltz2 NIM deployment. The model cache can be large, and you have several options:

Alternative Storage Options#

NFS Storage#

The chart does not support direct NFS volume mounts. For NFS-backed storage, use a StorageClass that provisions NFS volumes and enable persistence:

persistence:
  enabled: true
  storageClass: "nfs-client"  # Or your cluster's NFS provisioner
  accessMode: ReadWriteMany
  size: 100Gi

HostPath (development)#

The chart’s PV template uses persistence.hostPath as the node path for the PersistentVolume (default: /data/nim). To use a different path on the node:

persistence:
  enabled: true
  hostPath: /data/nim   # or e.g. /model-store — must exist on the node
  storageClass: standard
  size: 10Gi

Warning

HostPath ties the volume to a specific node and has security implications. Prefer a StorageClass for multi-node or production use.

GPU Configuration#

Single GPU Deployment#

Most Boltz2 deployments use a single GPU:

resources:
  limits:
    nvidia.com/gpu: 1

GPU Selection#

Use node selectors to target specific GPU types:

nodeSelector:
  nvidia.com/gpu.product: "NVIDIA-H100-80GB-HBM3"

Or use tolerations for GPU taints:

tolerations:
  - key: nvidia.com/gpu
    operator: Exists
    effect: NoSchedule

Service Configuration#

Basic Service#

The default service configuration exposes the Boltz2 API. The chart provides:

  • service.type – Service type (default: ClusterIP). This is the only option under the service key.

  • boltz2.servicePort – The port the service listens on (default: 8081). To use a different port, set this under the boltz2 section of your values.

  • Service name – Generated from the Helm release name by the chart’s templates and cannot be overridden.

Example values to change the service port:

service:
  type: ClusterIP

boltz2:
  servicePort: 9090  # Default is 8081

Port Forwarding (Testing)#

For local testing, use port forwarding. Use the same port as boltz2.servicePort (default 8081) on the right side of the mapping:

kubectl port-forward service/"${CHART_NAME}" 8080:8081

If you set boltz2.servicePort to a different value (e.g. 9090), use that port instead:

kubectl port-forward service/"${CHART_NAME}" 8080:9090

Test the API:

curl http://localhost:8080/v1/health/live

Ingress Configuration#

For production deployments, configure ingress:

ingress:
  enabled: true
  className: nginx
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
  hosts:
    - host: boltz2.example.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: boltz2-tls
      hosts:
        - boltz2.example.com

Advanced Configuration#

Autoscaling#

The chart supports horizontal pod autoscaling based on CPU utilization (GPU-based autoscaling is not supported):

autoscaling:
  enabled: true
  minReplicas: 1
  maxReplicas: 100
  targetCPUUtilizationPercentage: 80

If your chart version supports it, you can also set targetMemoryUtilizationPercentage for memory-based scaling.

Custom Probes#

The chart configures liveness and readiness probes under probes.liveness and probes.readiness. Probe paths are fixed in the template (/v1/health/live and /v1/health/ready). There is no startupProbe or enabled/path fields. Adjust timings for large models or slow storage:

probes:
  liveness:
    initialDelaySeconds: 300
    periodSeconds: 10
    timeoutSeconds: 1
    successThreshold: 1
    failureThreshold: 3
  readiness:
    initialDelaySeconds: 300
    periodSeconds: 10
    timeoutSeconds: 1
    successThreshold: 1
    failureThreshold: 3

Increase initialDelaySeconds or failureThreshold if pods need more time to become ready.

Security Context#

Configure pod and container security (chart defaults):

podSecurityContext:
  runAsUser: 1000
  runAsGroup: 1000
  fsGroup: 1000

securityContext:
  runAsUser: 1000
  runAsGroup: 1000
  allowPrivilegeEscalation: false

The chart uses securityContext for the container; there is no containerSecurityContext or capabilities in the default values.

Run Inference#

Once deployed, you can run predictions using the API:

curl -X POST http://localhost:8080/biology/mit/boltz2/predict \
  -H "Content-Type: application/json" \
  -d '{
    "polymers": [
      {
        "molecule_type": "protein",
        "sequence": "MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALPDAQFEVVHSLAKWKRQTLGQHDFSAGEGLYTHMKALRPDEDRLSPLHSVYVDQWDWERVMGDGERQFSTLKSTVEAIWAGIKATEAAVSEEFGLAPFLPDQIHFVHSQELLSRYPDLDAKGRERAIAKDLGAVFLVGIGGKLSDGHRHDVRAPDYDDWSTPSELGHAGLNGDILVWNPVLEDAFELSSMGIRVDADTLKHQLALTGDEDRLELEWHQALLRGEMPQTIGGGIGQSRLTMLLLQLPHIGQVQAGVWPAAVRESVPSLL"
      }
    ],
    "diffusion_samples": 1,
    "recycling_steps": 3
  }'

For detailed API documentation, refer to Inference.

Troubleshooting#

Pod Stuck in Pending#

Check events:

kubectl describe pod <pod-name>

Common issues:

  • Insufficient GPU resources

  • Node taints requiring tolerations

  • Storage mount failures

  • Image pull errors

Pod Fails to Start#

View logs:

kubectl logs <pod-name>

If the pod needs more time to become ready, increase probe delays or failure thresholds under probes (the chart does not support a startup probe):

probes:
  liveness:
    initialDelaySeconds: 600  # Allow more time before first check
    failureThreshold: 10
  readiness:
    initialDelaySeconds: 600
    failureThreshold: 10

Storage Issues#

Verify PVC status:

kubectl get pvc
kubectl describe pvc <pvc-name>

Ensure storage class supports the required access mode.

GPU Access#

Test GPU availability:

kubectl run gpu-test \
  --image=nvidia/cuda:12.6.2-base-ubuntu22.04 \
  --restart=Never \
  --command -- nvidia-smi

Uninstall#

To remove the Boltz2 NIM deployment:

helm uninstall "${CHART_NAME}"

To also remove persistent volumes:

kubectl delete pvc -l app.kubernetes.io/instance="${CHART_NAME}"

Additional Resources#