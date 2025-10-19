In order to run sosreport , a kubeconfig is needed to access the API Server.

1. Create a secret containing the kubeconfig

Copy Copied! kubectl create secret generic admin-config --from-file=kubeconfig=<path_to_kubeconfig>





1. Display the list of nodes in the cluster and export the selected node. The following command will display the list of nodes:

Copy Copied! kubectl get nodes

2. Then create a debug pod by deploying the following manifest:

Collapse Source Copy Copied! cat <<EOF | kubectl create -f - apiVersion: v1 kind: Pod metadata: name: dpf-sosreport spec: nodeName: ${NODE_NAME} containers: - name: sosreport image: ghcr.io/nvidia/sosreport:latest env: - name: CASE_ID value: "${CASE_ID}" imagePullPolicy: IfNotPresent securityContext: privileged: true runAsUser: 0 volumeMounts: - mountPath: /host name: host - mountPath: /run name: run - mountPath: /var/log name: varlog # sosreport check if this file exist before executing the kubernetes plugin, # without it no kubernetes output will be available. - mountPath: /etc/kubernetes/admin.conf name: adminconf subPath: kubeconfig - mountPath: /etc/localtime name: localtime - mountPath: /etc/machine-id name: machineid - mountPath: /boot name: boot - mountPath: /usr/lib/modules/ name: modules volumes: - hostPath: path: / name: host - hostPath: path: /run name: run - hostPath: path: /boot name: boot - hostPath: path: /usr/lib/modules/ name: modules - hostPath: path: /var/log name: varlog - secret: secretName: admin-config name: adminconf - hostPath: path: /etc/localtime name: localtime - hostPath: path: /etc/machine-id name: machineid restartPolicy: Never hostIPC: true hostNetwork: true hostPID: true EOF

In order to run sosreport , a kubeconfig is needed to access the API Server. When the report has to be generated for a tenant cluster, we have to retrieve the kubeconfig from the host cluster.

1. Get the kubeconfig name from the dpucluster spec .

Copy Copied! export KUBECONFIG_NAME=$(kubectl get dpucluster -n ${NAMESPACE} ${CLUSTER_NAME} -o jsonpath= '{.spec.kubeconfig}' )

2. Create the kubeconfig from the secret data

Copy Copied! kubectl get secrets -n ${NAMESPACE} ${KUBECONFIG_NAME} -o json \ | jq -r '.data["super-admin.conf"]' \ | base64 --decode \ > /tmp/${NAMESPACE}-${CLUSTER_NAME}.kubeconfig

3. Create a secret containing the kubeconfig in the tenant cluster

Copy Copied! kubectl create secret generic admin-config --from-file=kubeconfig=/tmp/${NAMESPACE}-${CLUSTER_NAME}.kubeconfig \ --kubeconfig=/tmp/${NAMESPACE}-${CLUSTER_NAME}.kubeconfig





1. Display the list of nodes in the cluster and export the selected node. The following command will display the list of nodes:

Copy Copied! kubectl get nodes

2. Then create a debug pod by deploying the following manifest:

Collapse Source Copy Copied! cat <<EOF | kubectl --kubeconfig=/tmp/${NAMESPACE}-${CLUSTER_NAME}.kubeconfig create -f - apiVersion: v1 kind: Pod metadata: name: dpf-sosreport spec: nodeName: ${NODE_NAME} containers: - name: sosreport image: ghcr.io/nvidia/sosreport:latest env: - name: CASE_ID value: "${CASE_ID}" imagePullPolicy: IfNotPresent securityContext: privileged: true runAsUser: 0 volumeMounts: - mountPath: /host name: host - mountPath: /run name: run - mountPath: /var/log name: varlog # sosreport check if this file exist before executing the kubernetes plugin, # without it no kubernetes output will be available. - mountPath: /etc/kubernetes/admin.conf name: adminconf subPath: kubeconfig - mountPath: /etc/localtime name: localtime - mountPath: /etc/machine-id name: machineid - mountPath: /boot name: boot - mountPath: /usr/lib/modules/ name: modules volumes: - hostPath: path: / name: host - hostPath: path: /run name: run - hostPath: path: /boot name: boot - hostPath: path: /usr/lib/modules/ name: modules - hostPath: path: /var/log name: varlog - secret: secretName: admin-config name: adminconf - hostPath: path: /etc/localtime name: localtime - hostPath: path: /etc/machine-id name: machineid restartPolicy: Never hostIPC: true hostNetwork: true hostPID: true EOF

The final repost archive is available under /tmp in the node filesystem.

In order to untar it, run :

Copy Copied! tar -x --xz -f sosreport-<node_name>-<case_id>-<date>-xxx.tar.xz



