Configure Jobs#
This section describes how to configure the jobs component of the NeMo Core microservice. This component is responsible for scheduling jobs and is the basis for how the following functional microservices execute jobs.
NeMo Auditor
NeMo Data Designer
NeMo Evaluator
NeMo Safe Synthesizer
Executors#
You can configure jobs through executors, which target different job execution backends and provide CPU and GPU compute.
You can define job executors as a combination of the following attributes:
A profile name (
profile
)A compute provider (e.g.
cpu
orgpu
)An execution backend (e.g.
docker
,kubernetes_job
)
The Core microservice supports the following execution backends:
Docker (
docker
)Kubernetes Jobs (
kubernetes_job
)
Profiles#
You can configure multiple execution profiles to suit the shape of your compute environment.
By default, the Core microservice defines a default CPU and/or default GPU provider to launch CPU and/or GPU bound jobs.
You can define any number of execution providers. For example, if you have a compute environment with heterogeneous infrastructure (e.g., two types of GPU hardware such as A100 and H200), you can define a list of execution profiles as follows:
jobs:
executors:
# Allow any CPU-bound job to run anywhere
- provider: cpu
profile: default
backend: kubernetes_job
config: {...}
# Run on A100 hardware
- provider: gpu
profile: a100-pool
backend: kubernetes_job
config:
# use the appropriate pod scheduling for a100-pool
node_selector:
node-pool-name: a100-pool
# Run on H200 hardware
- provider: gpu
profile: h200-pool
backend: kubernetes_job
config:
# use the appropriate pod scheduling for h200-pool
node_selector:
node-pool-name: h200-pool
Execution Backends#
Execution backends are the containerized job execution systems that NeMo microservices platform jobs are scheduled into.
Docker#
The Core microservice supports Docker as an execution backend for CPU and GPU based jobs.
Note: The Core microservice supports Docker-based job execution by default when running the NeMo microservices platform quickstart, which requires no configuration.
jobs:
executors:
# Define the default CPU provider
- provider: cpu
profile: default
backend: docker
config:
storage:
volume_name: nemo-microservices_jobs_storage
# Define the default GPU provider
- provider: gpu
profile: default
backend: docker
config:
storage:
volume_name: nemo-microservices_jobs_storage
Kubernetes Jobs#
The Core microservice supports Kubernetes Jobs as an execution backend for CPU and GPU based jobs.
Note: When deploying using the NeMo Microservices Helm Chart, the logging
, storage
, and image_pull_secrets
configurations are automatically configured for you. They are documented here for transparency.
Note: See the following Core microservice config for advanced configuration.
jobs:
executors:
# Define the default CPU provider
- profile: default
backend: kubernetes_job
provider: cpu
config:
# Storage is the kubernetes_job storage configuration
storage:
# Define the name of a persistent volume claim that will be used by launched jobs
pvc_name: nemo-core-jobs-storage
# Logging is the logging configuration for a kubernetes_job
logging:
configmap: nemo-core-jobs-logsidecar
image:
repository: fluent/fluent-bit
tag: 4.0.7
# Image_pull_secrets is the list of image pull secrets needed to pull images from container repositories
image_pull_secrets:
- name: nvcrimagepullsecret
# Define the default GPU provider
- profile: default
backend: kubernetes_job
provider: gpu
config:
storage:
pvc_name: nemo-core-jobs-storage
logging:
configmap: nemo-core-jobs-logsidecar
image:
repository: fluent/fluent-bit
tag: 4.0.7
image_pull_secrets:
- name: nvcrimagepullsecret
# You can configure custom labels and annotations on Kubernetes Jobs and their pods. This may be useful within environments that require adding integration with service meshes or similar cluster-level integrations.
job_metadata:
labels:
my-custom-label: "value"
annotations:
example.com/annotation: "value"
pod_metadata:
labels:
sidecar.istio.io/inject: "false"
annotations:
example.com/annotation: "value"
# You can configure typical Kubernetes pod scheduling behavior via node selectors, tolerations, and node/pod affinities.
node_selector:
kubernetes.io/arch: amd64
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: highmem
operator: In
values:
- "true"
Secrets for Jobs#
Jobs that have secret values like API keys use ephemeral secret storage.
When a job launches, secrets are stored in ephemeral secret storage and then accessed by the job. When a job terminates, the secrets are purged from ephemeral secret storage.
Vault / OpenBao#
The Core microservice supports Vault and OpenBao for configuring job secret storage.
Note: In this release, the Core microservice only supports using token-based authentication. It is not currently recommended to use Vault/OpenBao in a production deployment.
jobs:
secrets:
# Configure job secret storage to use Vault backend
backend: vault
# Vault is the Vault/OpenBao configuration
vault:
address: http://openbao:8200
token: your-vault-token
prefix: /nemo/jobs
Kubernetes#
The Core microservice supports Kubernetes Secrets for configuring job secret storage.
Note: When deploying using the NeMo Microservices Helm Chart, this configuration is automatically configured. The following configuration is the default values, provided for reference.
jobs:
secrets:
# Configure job secret storage to use Kubernetes backend
backend: kubernetes
# Kubernetes is the Kubernetes secret storage config
kubernetes:
# Configure access via in-cluster or kubeconfig. Defaults to in-cluster.
config_type: in-cluster
# The namespace where jobs are created and managed.
# Defaults to the namespace where NMP is deployed.
namespace: your-deployment-namespace