User Guide#

Nsight Operator user guide covering gateway configuration, storage configuration, the CLI reference, the browser-based Cloud UI, and known limitations.

Gateway Configuration#

The Nsight Operator deploys an Envoy-based gateway that provides a unified HTTP entry point for the Coordinator and Analysis REST APIs. The gateway is managed via the NsightGateway CRD – the operator controller reconciles this CR to deploy the gateway Deployment, Service, and Envoy configuration.

By default, the gateway is enabled via the nsight-gateway.enabled value:

nsight-gateway:
  enabled: true
  port: 8888
  service:
    type: ClusterIP

Set nsight-gateway.enabled to false to disable deployment of the gateway.

The gateway automatically discovers NsightCoordinator and NsightAnalysis CRs in the same namespace and routes traffic to them.

Accessing the Gateway#

Recommended – autoconfigure (requires kubectl access):

The simplest way to configure the CLI is to use autoconfigure, which automatically discovers the gateway URL, authentication mechanism, and storage configuration from the cluster:

python3 nsight_operator.py autoconfigure -n <namespace>

This works with all service types (ClusterIP, LoadBalancer, NodePort). For ClusterIP services, the CLI will automatically set up port-forwarding for subsequent commands.

Alternative – manual configure (when kubectl access is not available):

ClusterIP (default) – Set up Port-Forward:

kubectl port-forward -n <namespace> svc/nsight-operator-gateway 8888:8888 &
python3 nsight_operator.py configure --gw http://localhost:8888

LoadBalancer – Use External IP:

GATEWAY_IP=$(kubectl get svc nsight-operator-gateway -n <namespace> -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
python3 nsight_operator.py configure --gw http://${GATEWAY_IP}:8888

NodePort – Use Node IP and Port:

GATEWAY_NODEPORT=$(kubectl get svc nsight-operator-gateway -n <namespace> -o jsonpath='{.spec.ports[0].nodePort}')
NODE_IP=$(kubectl get nodes -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}')
python3 nsight_operator.py configure --gw http://${NODE_IP}:${GATEWAY_NODEPORT}

TLS Configuration#

By default, the gateway serves traffic over plain HTTP. When the gateway is exposed outside the cluster, you should enable TLS to encrypt traffic between clients and the gateway.

TLS is configured by providing a Kubernetes TLS secret containing a certificate and private key. The secret must contain tls.crt and tls.key keys. Optionally, include a ca.crt key so that the CLI can verify the server certificate.

Create the TLS secret:

kubectl create secret tls gateway-tls \
    --cert=tls.crt \
    --key=tls.key \
    -n <namespace>

Enable TLS via Helm values:

nsight-gateway:
  service:
    type: LoadBalancer
  tlsSecretRef:
    name: gateway-tls

When tlsSecretRef is set, the operator validates the secret, mounts it into the Gateway pod, and configures the listener for HTTPS.

Connecting with the CLI:

The CLI’s autoconfigure command automatically detects TLS and extracts the CA certificate from the secret for verification:

python3 nsight_operator.py autoconfigure -n <namespace>

When configuring manually, use an https:// URL:

python3 nsight_operator.py configure --gw https://<gateway-host>:8888

Note

For self-signed certificates, use autoconfigure which extracts the ca.crt from the Kubernetes secret automatically. Otherwise, add the CA to your system trust store.

Gateway Authentication#

The gateway supports three authentication modes: API key, JWT, and OAuth2. If none is enabled (the default), the gateway accepts requests without authentication – this “open” configuration is suitable for development or testing only and must not be used when the gateway is reachable outside the cluster.

Unauthenticated Mode (default)#

No configuration is required; the gateway listens on the configured port and accepts all requests. The CLI connects without any credentials:

python3 nsight_operator.py configure --gw http://<gateway-host>:8888

Warning

If you plan to expose the gateway via LoadBalancer, NodePort, or Ingress, enable at least one authentication mode below. Unauthenticated access allows anyone who can reach the gateway to start/stop profiling sessions and download all reports.

API-Key Authentication#

At installation time, an API key can be provided as a shared secret.

Example (inline key):

nsight-gateway:
  authentication:
    apikey:
      enabled: true
      key: "secret-api-key"

Example (key from Kubernetes Secret):

nsight-gateway:
  authentication:
    apikey:
      enabled: true
      keySecretRef:
        name: my-apikey-secret
        key: api-key

Configure the CLI with an API key. To avoid leaking the key into shell history, pass it through an environment variable:

 export NSIGHT_API_KEY=...  # paste or load from your secret store
python3 nsight_operator.py configure --gw http://localhost:8888 --apikey "${NSIGHT_API_KEY}"

Note

autoconfigure will automatically detect the API key from the cluster configuration and use it. No manual key entry is needed.

All HTTP requests to the services must carry the key’s value in an Authorization header:

Authorization: Bearer secret-api-key

OAuth2 Authentication#

OAuth2 authentication uses the Authorization Code Flow with an OIDC-compliant identity provider (e.g. Auth0, Okta, Keycloak, Azure AD). When enabled, the gateway:

Redirects unauthenticated browser requests to the IdP’s login page and sets a session cookie after successful authentication
Validates Bearer tokens from CLI/API clients against the IdP’s JWKS (discovered automatically via OIDC)

Note

OAuth2 requires TLS to be enabled on the gateway. The OAuth2 filter sets secure cookies that browsers will not send over plain HTTP. See TLS Configuration for setup instructions.

Example (Helm values with chart-managed Secret):

nsight-gateway:
  authentication:
    oauth2:
      enabled: true
      issuer: https://login.example.com
      clientId: "<client-id>"
      clientSecret: "<your-client-secret>"
      hmac: "<random-hmac-key>"
      scopes:
      - openid
      - profile
      - email

When OAuth2 is enabled through Helm with clientSecret and hmac set, the chart creates a Kubernetes Secret named <release-name>-gateway-oauth2 with the keys client_secret.key and hmac.key. The rendered NsightGateway references that Secret automatically.

The hmac value is a random secret used by the Gateway to sign session cookies. Generate one with openssl rand -base64 32. For production, do not commit clientSecret or hmac in values files. Supply them to Helm from a secret manager, CI secret, or uncommitted local secret file.

Example (direct NsightGateway CR with existing Secret):

If you create an NsightGateway custom resource directly instead of using the Helm chart-managed Secret, create a Kubernetes Secret with both required keys and reference it with spec.authentication.oauth2.clientSecretRef.name.

apiVersion: v1
kind: Secret
metadata:
  name: gateway-oauth2
type: Opaque
stringData:
  client_secret.key: "<your-client-secret>"
  hmac.key: "<random-hmac-key>"
---
apiVersion: nvidia.com/v1alpha1
kind: NsightGateway
metadata:
  name: nsight-operator-gateway
spec:
  authentication:
    oauth2:
      enabled: true
      issuer: https://login.example.com
      clientId: "<client-id>"
      clientSecretRef:
        name: gateway-oauth2
      scopes:
      - openid
      - profile
      - email

Configure the CLI with autoconfigure:

python3 nsight_operator.py autoconfigure -n <namespace>

Then log in:

python3 nsight_operator.py login

Note

When using the Authorization Code Flow, register http://localhost:8400/callback as an allowed redirect URI with your identity provider.

JWT Authentication#

JWT authentication validates incoming Bearer tokens against a provided JWKS. This mode is suited for scenarios where you manage your own token issuance or need to verify tokens from a specific provider using an explicit key set.

Tip

If your identity provider supports OIDC, consider using OAuth2 authentication instead. OAuth2 mode automatically discovers the JWKS from the IdP and additionally provides browser-based login via the Authorization Code Flow.

Example (inline JWKS):

nsight-gateway:
  authentication:
    jwt:
      enabled: true
      issuer: https://example.com
      audiences:
      - nsight-cloud
      verificationJwks: |
        {"keys": [{"kty": "RSA", "e": "AQAB", "use": "sig", "kid": "keyid12345678", "alg": "RS256", "n": "rsa-public-modulus-value"}]}

Example (JWKS from Kubernetes Secret):

nsight-gateway:
  authentication:
    jwt:
      enabled: true
      issuer: https://example.com
      audiences:
      - nsight-cloud
      jwksSecretRef:
        name: my-jwks-secret

Note

OAuth2 and JWT authentication can be enabled simultaneously – the gateway will accept tokens from either provider.

Routing Configuration#

The gateway routes requests to backend services based on URL prefixes. The default routing prefixes can be customized:

nsight-gateway:
  routing:
    coordinatorPrefix: "/coordinator/"
    analysisPrefix: "/analysis/"

Integrated Storage Proxy (MinIO)#

When cloud storage is enabled with the integrated MinIO backend (cloudStorage.minio.enabled: true), the gateway automatically proxies MinIO traffic on port 9000 via the Gateway.

This provides several benefits:

Single entry point: Clients only need access to the gateway service – no separate MinIO service access is required.
Simplified downloads: When using autoconfigure, the storage endpoint is automatically configured to point at the gateway’s MinIO proxy port.
Automatic port-forwarding: For ClusterIP services, the CLI automatically port-forwards both the HTTP port and the MinIO port when needed.

The MinIO proxy is enabled automatically when the gateway detects a cloudStorageRef pointing to an NsightCloudStorageConfig with MinIO enabled. No additional configuration is needed.

Warning

The MinIO proxy listener on the gateway bypasses gateway authentication. Access to MinIO is controlled only by the credentials in the storage configuration secret.

Before exposing the gateway externally:

Do not expose the MinIO proxy port (9000) on a public LoadBalancer; restrict it to internal networks or disable the listener entirely.
Enable TLS on the gateway (see TLS Configuration) so MinIO traffic is encrypted in transit.
Rotate the MinIO credentials in the storage configuration secret periodically and treat the secret as privileged.

Storage Configuration#

Nsight Operator reads and writes profiling reports through the NsightCloudStorageConfig CR. Out of the box the operator deploys an in-cluster MinIO instance for S3-compatible storage; you can also point at an external S3-compatible bucket or keep reports inside profiled Pods. This page focuses on how to choose a backend and covers the most common configurations – see the CRD reference for every field.

Note

The default in-cluster MinIO uses ephemeral storage (emptyDir). Reports are lost if the MinIO Pod restarts. Enable persistent storage (shown below) or use external S3 for anything other than short-lived experiments.

Choosing a backend#

Backend	When to use	How to configure
Operator-managed MinIO	Single-cluster setups, demos, dev clusters. Zero external dependencies.	Default. Optionally enable `cloudStorage.minio.persistence.*` to survive restarts.
External S3-compatible	Shared storage across clusters, existing lifecycle policies, or when data must live outside Kubernetes.	Set `cloudStorage.minio.enabled: false` and supply a `cloudStorage.secretRef` pointing to a Secret with your S3 credentials.
Local (in-Pod)	Short-lived captures where the report can be retrieved with `kubectl cp`.	`NsightCloudStorageConfig.spec.storage_type: local`. Not supported by `nsight_operator.py download`.

Enabling persistent MinIO#

cloudStorage:
  minio:
    persistence:
      enabled: true
      storageClassName: "your-storage-class"
      size: 20Gi

Using external S3#

Step 1 – create a Secret with the storage configuration:

apiVersion: v1
kind: Secret
metadata:
  name: my-s3-credentials
  namespace: nsight-operator
type: Opaque
stringData:
  storage-config.yaml: |
    storage_type: s3
    bucket_name: my-profiling-results
    aws_access_key_id: YOUR_ACCESS_KEY
    aws_secret_access_key: YOUR_SECRET_KEY
    region_name: us-west-2
    local_cache_dir: /tmp/s3-cache

Step 2 – point the chart at it (this disables the in-cluster MinIO):

cloudStorage:
  enabled: true
  minio:
    enabled: false
  secretRef:
    name: my-s3-credentials

Per-tenant storage with NsightCloudStorageConfig#

In multi-tenant clusters each tenant typically has its own storage. Create a NsightCloudStorageConfig in the tenant namespace and reference it from the tenant’s NsightOperatorProfileConfig:

apiVersion: nvidia.com/v1alpha1
kind: NsightCloudStorageConfig
metadata:
  name: team-storage-config
  namespace: my-team-ns
spec:
  enabled: true
  storage_type: s3
  secretRef:
    name: team-s3-credentials
---
apiVersion: nvidia.com/v1
kind: NsightOperatorProfileConfig
metadata:
  name: team-profile-config
  namespace: my-team-ns
spec:
  nsightToolConfigs:
    - name: "team-profile"
      cloudStorageConfigRef: "team-storage-config"

For every available field (persistence, service type, mount paths, etc.), see NsightCloudStorageConfig.

Nsight Cloud CLI#

The nsight_operator.py script is used to manage profiling sessions via the Nsight Gateway HTTP API. It is the primary tool for profiling users.

A second helper script, nsight_operator_dynamo.py, is provided for installing and configuring the operator for NVIDIA Dynamo deployments. It delegates to Helm and kubectl and is only used at setup time; the actual profiling workflow still runs through nsight_operator.py. See Quick Start Example (NVIDIA Dynamo) for Dynamo-specific examples.

Script	Purpose	When to use it
`nsight_operator.py`	Profiling control (sessions, collections, downloads, analysis)	For every profiling workflow
`nsight_operator_dynamo.py`	Install/configure Nsight Operator for Dynamo (Helm + kubectl helper)	Setup/teardown for Dynamo deployments

Both scripts are distributed with the NVIDIA Nsight Operator Resources NGC resource.

Prerequisites#

Python 3.12 or higher
NVIDIA Nsight Operator Resources bundle downloaded from NGC (see Prerequisites). Unpack it locally; it contains:
- nsight_operator.py – the profiling CLI used throughout this guide.
- nsight_operator_dynamo.py – helper script for NVIDIA Dynamo setups (see Quick Start Example (NVIDIA Dynamo)).
- requirements.txt – Python dependencies for both scripts.
- examples/ – sample values files, profile configs, and manifests.
Python dependencies: from the unpacked bundle directory, run
```
pip install -r requirements.txt
```
Gateway access: the script connects to the Coordinator via the Nsight Gateway HTTP endpoint. See Gateway Configuration for how to set up access to the gateway.

Usage Overview#

Before using any profiling commands, you must first configure the CLI to connect to the gateway. Use autoconfigure (recommended, requires kubectl access) or configure (manual). Credentials and connection details are stored in ~/.nsight-cloud.conf and persist between sessions.

Command-Line Syntax#

python3 nsight_operator.py [GLOBAL_OPTIONS] ACTION [ACTION_OPTIONS]

Common global options:

--tag <TAG>: Specify the tag name to control (defaults to default).
-v: Enable more verbose (debug-level) logging.
--session <UUID>: Target a specific session. Supported by profiler-start, profiler-stop, session-end, ls, and download.

Type python3 nsight_operator.py -h for full details on all arguments.

Actions#

1. autoconfigure#

Automatically configure the CLI by inspecting the Kubernetes cluster. This discovers the gateway URL, authentication mechanism, and storage configuration.

python3 nsight_operator.py autoconfigure -n <namespace> [-s <servicename>]

Option	Description	Default
`-n`, `--namespace`	Kubernetes namespace where the gateway is deployed (required)	–
`-s`, `--servicename`	Name of the gateway Kubernetes Service	`nsight-operator-gateway`

What it discovers:

Gateway URL: Derived from the Service type and its host/port.
Authentication: Reads the NsightGateway CR to detect the auth mechanism.
Storage configuration: If integrated MinIO is enabled, the storage endpoint is automatically set to the gateway’s MinIO proxy port (9000).
Port-forwarding: For ClusterIP services, the CLI will automatically set up kubectl port-forward for subsequent commands.

2. configure#

Manually configure the CLI by specifying the gateway URL and optional authentication credentials.

python3 nsight_operator.py configure --gw <GATEWAY_URL> [AUTH_OPTIONS]

Option	Description
`--gw`	Full URL of the gateway (required). Example: `https://gateway.example.com:8888`
`--apikey`	API key for authentication. Mutually exclusive with OAuth2 options.
`--issuer`	OIDC issuer URL for OAuth2/JWT authentication.
`--clientid`	OAuth2 client ID. Must be provided together with `--issuer`.
`--oauth-flow-type`	OAuth2 login flow: `code` (Authorization Code, default) or `device` (Device Authorization Grant).
`--authorization-endpoint`	Explicit OAuth2 authorization endpoint URL. Optional; defaults to the value discovered from the issuer’s `.well-known/openid-configuration`. Supply this when your IdP does not expose OIDC discovery.
`--token-endpoint`	Explicit OAuth2 token endpoint URL. Optional; defaults to the value discovered from the issuer’s `.well-known/openid-configuration`.

Note

configure does not set up storage configuration or automatic port-forwarding. To download profiling results, you will need to configure storage access separately. See Configuring Storage Access for Downloads.

3. login#

Authenticate with the identity provider. Required when OAuth2 or JWT authentication is configured. Not needed for API key or no-auth setups.

python3 nsight_operator.py login

Authorization Code Flow (code, default): Opens your browser to the IdP’s login page. Register http://localhost:8400/callback as an allowed redirect URI with your IdP.
Device Authorization Grant (device): Displays a verification URL and user code in the terminal.

4. session-begin#

Explicitly open a new profiling session with an optional human-readable title. This is optional – the first profiler-start implicitly opens a session if none is active. Using session-begin lets you assign a title that is used as the top-level directory name when downloading results.

python3 nsight_operator.py [--tag <TAG>] session-begin [--title <TITLE>]

Option	Description
`--title`, `-t`	Human-readable title for the session. Used as the top-level directory name (`<YYYYMMDDHHMMSS>-<title>`) when downloading results.

Tip

Give your sessions descriptive titles to make downloaded reports easy to identify – for example session-begin --title "baseline-attention-heads".

5. profiler-start#

Start collecting profiling data. If no session is active, one will be created automatically.

python3 nsight_operator.py [--tag <TAG>] profiler-start [--session <UUID>]

6. profiler-stop#

Stop the active data collection, but keep the session open.

python3 nsight_operator.py [--tag <TAG>] profiler-stop [--session <UUID>]

7. session-end#

End the profiling session.

python3 nsight_operator.py [--tag <TAG>] session-end [--session <UUID>]

8. ls#

List the profiling artifacts (files) associated with the session.

python3 nsight_operator.py [--tag <TAG>] ls [--session <UUID>]

9. download#

Download profiling artifacts to your local machine.

python3 nsight_operator.py [--tag <TAG>] download [--session <UUID>] [OPTIONS]

Option	Description	Example
`--output-dir`, `-o`	Directory where results will be downloaded (default: current directory)	`-o /tmp/nsight_profiles`
`--flat`	Save all files into the output directory with flattened names instead of a directory tree	`--flat`
`--session`	Download files for a specific session (default: current active session)	`--session <UUID>`

By default, downloaded files are organized into a directory tree:

<output-dir>/
  <YYYYMMDDHHMMSS>-<session-title>/
    <collectionID>/
      <process_name>_<tag>_<instance_name>_<file_stub>.nsys-rep

Configuring Storage Access for Downloads#

Note

If storage is configured as local type, profiling results are stored directly in the pods and must be downloaded manually using kubectl cp or similar methods.

If you used autoconfigure – storage access is already configured. Just run download directly.

If you used configure – you need to configure storage access manually:

Step 1: Extract the storage configuration from the cluster

export NSIGHT_CLOUD_STORAGE_CONFIG_FILE=/tmp/storage-config.yaml
kubectl get secret nsight-operator-cloud-storage-secret -n <namespace> \
    -o jsonpath='{.data.storage-config\.yaml}' | base64 -d > $NSIGHT_CLOUD_STORAGE_CONFIG_FILE

Step 2: Configure the storage endpoint

For ClusterIP (default), port-forward the gateway:

kubectl port-forward -n <namespace> svc/nsight-operator-gateway 9000:9000 &
sed -i 's|endpoint_url: .*|endpoint_url: http://localhost:9000|' $NSIGHT_CLOUD_STORAGE_CONFIG_FILE

Step 3: Run the download command

python3 nsight_operator.py download --output-dir /tmp/nsight_profiles

10. status#

Show the current status of the profiling tag, including connected collection agents and any active sessions.

python3 nsight_operator.py status [--tag <TAG>]

11. analysis#

Run Nsight Systems recipes on collected profiles and manage their reports. This is a command group with multiple subcommands.

python3 nsight_operator.py analysis SUBCOMMAND [OPTIONS]

Subcommand	Purpose
`analysis list`	List all available recipes.
`analysis info <name>`	Get detailed information about a recipe.
`analysis run <name> [args...]`	Create a job to run a recipe. Accepts `--session` and `--collection` to scope the input data.
`analysis ui`	Open the analysis UI for the current session in the default browser.
`analysis reports list`	List all analysis jobs (optionally filtered by `--status`).
`analysis reports info <id>`	Get details for a specific job.
`analysis reports logs <id>`	View logs for a job.
`analysis reports download <id>`	Download the files produced by a job.
`analysis reports open <id>`	Open the Jupyter notebook(s) produced by a job in the default browser.
`analysis reports cancel <id>`	Cancel a running job.
`analysis reports delete <id>`	Delete a job.

See Analysis Guide for a full reference, examples, and the underlying REST API.

Nsight Cloud UI#

The Nsight Cloud UI is a browser-based single-page application that provides a visual front-end for Nsight Operator. It lets you browse profiling sessions, inspect collections, run analysis recipes, and launch Nsight Streamer instances for individual reports – all through the same NsightGateway used by the CLI.

Two cluster resources back the UI:

NsightCloudUI – serves the web UI at the gateway root.
NsightTenantOperator – tenant-scoped REST API the UI calls to create and tear down per-session streamers. The operator reconciles this CR with its own ServiceAccount, RBAC, and Deployment in each namespace that has a NsightCloudUI.

Both CRs are created automatically by the parent Helm chart when their respective enabled values are left at their defaults (nsight-cloud-ui.enabled: true, nsight-tenant-operator.enabled: true).

Accessing the UI#

Because the UI is served at the gateway root, any path you already use to reach the gateway – autoconfigure port-forward, LoadBalancer IP, NodePort, etc. (see Gateway Configuration) – also serves the UI.

Port-forward example:

kubectl port-forward -n <namespace> svc/nsight-operator-gateway 8888:8888

Then open http://localhost:8888/ in a browser. Authentication uses whatever mechanism is configured on the gateway; see Gateway Authentication.

What you can do from the UI#

Browse sessions and collections. The main list shows every session visible to the gateway, including its title, state, and collection history. Clicking a session opens a detail page with its collections and reports.
Run analysis recipes. The Analysis tab exposes the same recipes available from nsight_operator.py analysis; see Analysis Guide. Completed jobs link directly to their Jupyter notebooks.
Launch streamers per session. “View Trace” triggers the tenant operator to create a NsightStreamer scoped to the current session, so you can open reports in a browser without downloading them. Concurrent launches are capped by streamerLaunch.maxActive on the NsightTenantOperator (default 10 via the parent Helm chart).
Control active profiling. Start / stop collections and end sessions from the session list; the UI calls the coordinator via the gateway just like the CLI.

Relationship to the CLI#

Everything the UI does is available from the CLI, and vice versa; they are two front-ends to the same gateway APIs. Teams often mix both – the UI for interactive analysis, the CLI for scripted / CI workflows.

Known Limitations and Notes#

CLI Limitations#

nsight_operator.py does not currently parse or apply Nsight Systems arguments like --sampling-rate or --nvtx at invocation time. Profiling options must be configured via NsightOperatorProfileConfig or Helm values (nsight-injector.nsightToolConfig.nsightToolArgs) and take effect when target Pods start.

Tool Support#

Nsight Operator integrates with Nsight Systems (nsys) only. Other Nsight tools (Nsight Compute, Nsight Graphics, Nsight Deep Learning Designer) are out of scope for the operator.
Nsight Streamer GPU hardware acceleration requires an Ada Lovelace or newer NVIDIA GPU with AV1 encode support, plus the NVIDIA container runtime (runtimeClassName: nvidia) and GPU drivers on the host.

Storage#

The default operator-managed MinIO deployment uses ephemeral storage (emptyDir). Reports are lost if the MinIO Pod restarts. For production use, enable persistent storage (see Storage Configuration) or bring your own S3-compatible backend.
When spec.storage_type: local is used on NsightCloudStorageConfig, reports are stored inside the target Pod’s filesystem and must be retrieved with kubectl cp – nsight_operator.py download does not support this storage type.

Profiling Constraints#

Existing Pods are not retroactively mutated. Pod labels and profile configs only apply to Pods admitted after the webhook is installed and the rules are in place. Restart the owning Deployment / StatefulSet to enable profiling; see Enabling Profiling on Target Resources.
Single GPU-metrics collector per GPU. Only one profiling run with --gpu-metrics-devices may execute on a given GPU at a time. If the NVIDIA GPU Operator’s nvidia-dcgm-exporter DaemonSet is active, it must be temporarily disabled during profiling; see Troubleshooting.
Profiling requires adjusting kernel.perf_event_paranoid on worker nodes (default value written by the operator is 2). Clusters that disallow node-level sysctl changes must manage this setting out-of-band (machineConfig: null) – see Troubleshooting for OpenShift guidance.

Platform Support#

Supported node architectures: x86_64 and aarch64 (SBSA).
Kubernetes: v1.19+ with the admissionregistration.k8s.io/v1 API enabled (see Prerequisites).
Python: 3.12 or newer is required for the CLI.
OpenShift: Supported, with additional setup for Security Context Constraints and node configuration (see OpenShift).

Networking / Gateway#

OAuth2 requires TLS on the gateway. Browsers do not send secure session cookies over plain HTTP; enable TLS before enabling OAuth2. See TLS Configuration.
The integrated MinIO proxy listener on the gateway does not enforce gateway authentication – MinIO access is controlled by the credentials in the storage-config Secret. Do not expose port 9000 externally without additional network controls.
Coordinator high availability is limited: the default deployment runs a single Coordinator replica per coordinator-enabled namespace. The operator controller itself supports leader election for multi-replica installs (leaderElection.enabled: true).