1. Introduction & Personas
Congratulations on your new DGX Cloud cluster!
This guide is intended to provide the necessary information for the cluster owner, cluster admins, and cluster users to get started with their primary responsibilities on their DGX Cloud cluster.
The intended workflow of the guide starts with the cluster owner, who is the main contact for managing the DGX Cloud subscription and cluster.
More detailed information about user roles and functionalities can be found in the corresponding guides:
NVIDIA DGX Cloud Slurm Cluster Admin Guide (For cluster owners and admins)
NVIDIA DGX Cloud Slurm Cluster User Guide (For cluster users)
1.1. Cluster Owner
The cluster owner is responsible for:
Onboarding cluster admins and users via
cmsh
Enrolling and inviting admins and users to NGC
Activating their subscription and registering for NVIDIA Enterprise Support
Collaborating with NVIDIA and cluster admins to troubleshoot and report any issues related to their DGX Cloud cluster
1.2. Cluster Admin
For a cluster admin, common tasks include:
Onboarding cluster users via Base View and creating/managing teams/accounts
Managing the cluster’s high-performance filesystem
Configuring Slurm queues and user quotas
Deeper inspection and manipulation of the Slurm job queue state
Debugging cluster behavior concerns
1.3. Cluster User
For a cluster user, common tasks include:
Scheduling compute jobs in Slurm
Ad hoc Slurm job queue interaction
Downloading source code or datasets
Manipulating configuration files
2. Cluster Owner Steps
2.1. Prerequisites
As a cluster owner, ensure the following steps have been completed:
Your Technical Account Manager (TAM) should have already reached out to you as the organization admin. During this process:
The TAM should have created a shared communication channel with you. Use this channel for any questions or issues during your experience on DGX Cloud.
You should have created an SSH key pair, and sent the public key to your TAM for initial access to the cluster.
To access the cluster, the TAM will provide the following information:
Head Node:
<IP address of head node>
Login Nodes:
<IP addresses of login nodes>
You will use the head node to manage the BCM installation, cluster configuration, and admin/user onboarding. The tool you will use for cluster configuration is cmsh
(cluster management shell).
You will have SSH access to the head node via the root user and can SSH into the login nodes from the head node.
Cluster admins and users will primarily use the login nodes for their day-to-day work on the DGX Cloud cluster. They will have SSH and Base View access to the login nodes only.
As a cluster owner, you will be responsible for creating user accounts for cluster admins and users.
Important
As a security best practice, usage of the root account should be minimized. Instructions to create a non-root user for the cluster owner to access the head node with can be found in the Enable an Alternative Cluster Owner Account section of the NVIDIA DGX Cloud Cluster Administration Guide.
The root account used by the cluster owner on the head node should not be used to run jobs in the Slurm cluster. Only cluster admins and users should run jobs on the Slurm cluster, via the login nodes.
If needed, the cluster owner should create their own separate admin and/or user accounts to access the login nodes for work that does not require root access.
2.2. Accessing the Head Node as Root (Cluster Owner Only)
As the cluster owner, you can access the head node using the SSH key pair you provided to your Technical Account Manager (TAM). To do this, use the following command:
ssh -i /path/to/ssh_cert root@ip-addr-of-head-nodeNote
If you encounter any issues while trying SSH, refer to Troubleshooting for assistance.
2.3. Adding Cluster Admins Via cmsh
As a cluster owner, you can add admins to help manage the Slurm cluster using the following steps.
Compile a list of cluster admins: Make a list of people who will require admin access to the cluster.
Create SSH key pairs: Ask each cluster admin to create an SSH key pair for themselves using the following command:
ssh-keygen -t rsa -b 4096 -f ~/.ssh/<cluster-admin>-rsa-dgxc -C "cluster_admin_email@example.com"
Obtain public keys: Once each admin has an SSH key pair, have each send you the contents of their public key (
<cluster-admin>-rsa-dgxc.pub
) file generated in their~/.ssh/
directory. You will use this information in the following steps to create the cluster admin user.Create a cluster admin user: From the head node as
root
, run the following commands to create a cluster admin.Enter cmsh with this command:
cmsh
Run the following commands within cmsh:
1user 2add <cluster-admin> 3set password 4set profile tenantadmin 5commit 6 7group 8use tenantadmin 9append members <cluster-admin> 10commit 11quit
Switch to the user’s account:
sudo su - <cluster-admin>
Add their SSH public key (obtained during Step 3 above) to the
authorized_keys
file using a text editor of your choice. For example,nano $HOME/.ssh/authorized_keys
Configure their admin user account to automatically load the slurm module upon login. This will avoid the user having to run the
module load slurm
command every time on login. Run the below command to do so:module initadd slurm
Exit the admin user’s account:
exit
Run the following commands to add the admin user as a Slurm admin:
1module load slurm 2sacctmgr add User User=<cluster-admin> Account=root AdminLevel=Administrator
Commit the changes when prompted.
(Optional) Create a shared scratch space on LustreFS for the admin user: If the cluster admin will be running slurm jobs, you can configure their user to have a scratch space on the Lustre shared file system, or they can configure this themselves if needed (using
sudo
). Follow the steps in below to do so.Run the following commands to create the admin user’s scratch space on the Lustre filesystem:
mkdir -p /lustre/fs0/scratch/<cluster-admin> chown <cluster-admin>:<cluster-admin> /lustre/fs0/scratch/<cluster-admin>
(Optional) You can then assign a quota to the admin user if necessary, using the commands below. More details can be found in the Managing Lustre Storage section of the NVIDIA DGX Cloud Cluster Administration Guide.
1# see current quota 2lfs quota -u <cluster-admin> -v /lustre/fs0/scratch/<cluster-admin> 3 4#example output of lfs quota for a user named demo-user 5Disk quotas for usr demo-user (uid 1004): 6 Filesystem kbytes quota limit grace files quota limit grace 7/lustre/fs0/scratch/demo-user/ 8 4 0 0 - 1 0 0 - 9lustrefs-MDT0000_UUID 10 4 - 0 - 1 - 0 - 11lustrefs-OST0000_UUID 12 0 - 0 - - - - - 13lustrefs-OST0001_UUID 14 0 - 0 - - - - - 15lustrefs-OST0002_UUID 16 0 - 0 - - - - - 17lustrefs-OST0003_UUID 18 0 - 0 - - - - - 19lustrefs-OST0004_UUID 20 0 - 0 - - - - - 21Total allocated inode limit: 0, total allocated block limit: 0 22uid 1004 is using default block quota setting 23uid 1004 is using default file quota setting 24 25# set quota, e.g., 100G, and inodes 26lfs setquota -u <cluster-admin> -b 100G -B 100G -i 10000 -I11000 /lustre/fs0/scratch/<cluster-admin> 27 28#example output after running setquota for a user named demo-user 29Disk quotas for usr demo-user (uid 1004): 30 Filesystem kbytes quota limit grace files quota limit grace 31/lustre/fs0/scratch/demo-user/ 32 4 104857600 104857600 - 1 10000 11000 - 33lustrefs-MDT0000_UUID 34 4* - 4 - 1* - 1 - 35lustrefs-OST0000_UUID 36 0 - 0 - - - - - 37lustrefs-OST0001_UUID 38 0 - 0 - - - - - 39lustrefs-OST0002_UUID 40 0 - 0 - - - - - 41lustrefs-OST0003_UUID 42 0 - 0 - - - - - 43lustrefs-OST0004_UUID 44 0 - 0 - - - - -
Send the information to the admin: The admin is now set up to access the cluster and start working. Send the following information to the admin:
Login node addresses
Their username and password information
Which SSH public key you configured for their user
(Optional) Their LustreFS scratch directory information
Each cluster admin should now be able to log on to the login nodes from the Prerequisites section using the following command:
ssh -i /path/to/cluster_admin_ssh_cert <cluster-admin>@ip-addr-of-login-node
Repeat Steps 1 through 6 for each cluster admin user you want to create.
2.4. Adding Cluster Users Via cmsh
As a cluster owner, you’ll need to gather some information to onboard users to the cluster.
Compile a list of cluster users: Start by compiling a list of users who need access to the cluster.
Create SSH key pairs: Each user will need to create an SSH key pair for themselves with the following command:
ssh-keygen -t rsa -b 4096 -f ~/.ssh/<cluster-user>-rsa-dgxc -C "your_email@example.com"
Obtain public keys: Once each user has created an SSH key pair, have them send you the contents of their public key (
<cluster-user>-rsa-dgxc.pub
) file located in their~/.ssh/
directory. You will use this in the following steps to create the cluster user.Create a cluster user: From the head node as
root
, run the following commands to create a cluster user.Enter cmsh with this command:
cmsh
Within
cmsh
, run the following commands to create a cluster user:1user 2add <cluster-user> 3set password 4set profile portal 5commit 6quit
Switch to the user’s account using the following command:
sudo su - <cluster-user>
Add the user’s SSH public key (obtained earlier above) into the
authorized_keys
file in their~/.ssh/
directory, using the text editor of your choice. For example,nano $HOME/.ssh/authorized_keys
Configure their user account to automatically load the slurm module upon login. This will avoid the user having to run the
module load slurm
command every time on login. Run the below command to do so:module initadd slurm
Exit the user’s account by running the following command:
exit
Create a shared scratch space on LustreFS for the user: Next, create a LustreFS directory for the user. Follow the steps below to create and configure shared storage for the user.
Run the following commands to create the user scratch space on the Lustre filesystem:
mkdir -p /lustre/fs0/scratch/<cluster-user> chown <cluster-user>:<cluster-user> /lustre/fs0/scratch/<cluster-user>
(Optional) You can then assign a quota to the user if necessary, using the commands below. More details can be found in the Managing Lustre Storage section of the NVIDIA DGX Cloud Cluster Administration Guide.
1# see current quota 2lfs quota -u <cluster-user> -v /lustre/fs0/scratch/<cluster-user> 3 4#example output of lfs quota for a user named demo-user 5Disk quotas for usr demo-user (uid 1004): 6 Filesystem kbytes quota limit grace files quota limit grace 7/lustre/fs0/scratch/demo-user/ 8 4 0 0 - 1 0 0 - 9lustrefs-MDT0000_UUID 10 4 - 0 - 1 - 0 - 11lustrefs-OST0000_UUID 12 0 - 0 - - - - - 13lustrefs-OST0001_UUID 14 0 - 0 - - - - - 15lustrefs-OST0002_UUID 16 0 - 0 - - - - - 17lustrefs-OST0003_UUID 18 0 - 0 - - - - - 19lustrefs-OST0004_UUID 20 0 - 0 - - - - - 21Total allocated inode limit: 0, total allocated block limit: 0 22uid 1004 is using default block quota setting 23uid 1004 is using default file quota setting 24 25# set quota, e.g., 100G, and inodes 26lfs setquota -u <cluster-user> -b 100G -B 100G -i 10000 -I11000 /lustre/fs0/scratch/<cluster-user> 27 28#example output after running setquota for a user named demo-user 29Disk quotas for usr demo-user (uid 1004): 30 Filesystem kbytes quota limit grace files quota limit grace 31/lustre/fs0/scratch/demo-user/ 32 4 104857600 104857600 - 1 10000 11000 - 33lustrefs-MDT0000_UUID 34 4* - 4 - 1* - 1 - 35lustrefs-OST0000_UUID 36 0 - 0 - - - - - 37lustrefs-OST0001_UUID 38 0 - 0 - - - - - 39lustrefs-OST0002_UUID 40 0 - 0 - - - - - 41lustrefs-OST0003_UUID 42 0 - 0 - - - - - 43lustrefs-OST0004_UUID 44 0 - 0 - - - - -
Note
If no quota is set, the user has unlimited storage access to the whole Lustre filesystem, and if not careful, can consume the entire filesystem.
Send the information to the user: The user is now set up to access the cluster and start working. Send the following information to the user:
Login node addresses
Their username and password information
Which SSH public key you configured for their user
Their LustreFS scratch directory information
Each user should now be able to log on to the login nodes from the Prerequisites section using the following command:
ssh -i /path/to/cluster_user_ssh_cert <cluster-user>@ip-addr-of-login-node
Repeat Steps 1 through 6 for each cluster user you want to create.
(Optional) Create a list of cluster teams or Slurm accounts: Refer to Setting Up Fair Share Scheduling and Teams for more information.
2.5. Setting Up NGC
As a part of the DGX Cloud subscription, your organization has received access to NVIDIA NGC, with Private Registry and NVIDIA AI Enterprise subscriptions enabled. As the cluster owner, you will be responsible for managing your NGC org, and inviting your admins and users to NGC.
For more information on setting up your NGC org, please see the NGC User Guide.
To invite users to the NGC org, follow the steps in the NGC User Guide here.
3. Cluster Admin Steps (Optional)
Cluster admins can manage the configuration of the Slurm job scheduler, run jobs, and execute tasks that require sudo
access from the login, CPU, and GPU nodes. Additionally, cluster admins can onboard other cluster admins and users via Base View.
Note
The following sections assume your cluster owner created a cluster admin user. If you haven’t been set up with an admin user and an SSH key pair for logging into the cluster, please contact your cluster owner to get onboarded.
3.1. Accessing the Login Node
As a cluster admin, you have SSH access to the login nodes but not to the head node. Cluster admins can also access Base View via the login nodes.
To access the login node, follow these steps:
Obtain the login node IPs from the cluster owner.
Log in via SSH with the user account(s) created by the cluster owner:
ssh -i /path/to/ssh_cert <cluster-admin>@ip-addr-of-login-node
Note
If you encounter any errors while trying SSH, refer to the Troubleshooting section for help.
3.2. Accessing Base View
Base View is a browser-based GUI that provides a dashboard view of the cluster.
Refer to the Accessing Base View section of the NVIDIA DGX Cloud Cluster Administration Guide for details.
3.3. Adding Users Via Base View
Note
The steps in this section do not need to be performed if the cluster owner has already completed the user creation via cmsh
. If the cluster admin is creating users via Base View, proceed with the steps below.
For more information about creating and onboarding users in Base View, refer to the Adding Users Via Base View section of the NVIDIA DGX Cloud Cluster Administration Guide.
4. Cluster User Steps
Cluster users can perform the following actions on the login nodes:
Use Slurm commands such as
sinfo
andsqueue
to determine the state of the Slurm job queueInteract with NFS and Lustre storage attached to the cluster
Target jobs between CPU and GPU nodes depending on the use case
Schedule blocking or interactive jobs on the Slurm job queue
Schedule batch jobs on the Slurm job queue
Note
The following sections assume that your cluster admin has worked with you to create a cluster user. If you do not have a user and SSH key pair for logging in to the cluster yet, please contact your cluster admin to get onboarded.
4.1. Accessing the Login Node
Cluster users will have SSH access to the login nodes only. Cluster users can also access the User Portal through the login nodes.
To access the login node, follow these steps:
Obtain the login node IPs from your cluster admin.
Log in via SSH with the user account(s) created by the cluster admin:
ssh -i /path/to/ssh_cert <cluster-user>@ip-addr-of-login-node
Note
If you encounter any errors while trying SSH, refer to the Troubleshooting section for help.
4.2. Setting Up NGC Integration
For more information on setting up your user account to be able to pull containers from NGC, refer to Setting Up NGC Integration in the DGX Cloud Cluster User Guide.
4.3. Running Jobs
The following sections guide you on how to set up and run basic jobs from the login nodes.
4.3.1. Loading Slurm Modules
To interact with software that has been installed in a DGX Cloud cluster, the appropriate modules must be loaded. Modules provide a quick method for loading and unloading specific sets of software and configuration data in a DGX Cloud environment. For more information about modules, see section 2.2 of the Base Command Manager administrator manual.
module load slurm
If you would like to configure your user account to load the Slurm module automatically, run the following command, log out, then log back into the login node.
module initadd slurm
4.3.2. Running a Single-Node Job
The example below runs a common single node GPU-based job with the NCCL tests tool from an NGC container. Refer to Single-node Jobs in the DGX Cloud Cluster User Guide for more information.
Create a script at
$HOME/run-sn.sh
using the text editor of your choice, with the following content:1#!/bin/bash 2 3# should be encoded in enroot/environment.d/... 4export PMIX_MCA_gds=hash 5export PMIX_MCA_psec=native 6export OMPI_MCA_coll_hcoll_enable=0 7export CUDA_DEVICE_ORDER=PCI_BUS_ID 8export NCCL_SOCKET_IFNAME=eth0 9export NCCL_IB_PCI_RELAXED_ORDERING=1 10export NCCL_TOPO_FILE=/cm/shared/etc/ndv4-topo.xml 11export MELLANOX_VISIBLE_DEVICES=all 12export UCX_TLS=rc 13export UCX_NET_DEVICES=mlx5_0:1,mlx5_1:1,mlx5_2:1,mlx5_3:1,mlx5_4:1,mlx5_5:1,mlx5_6:1,mlx5_7:1 14 15export NCCL_PROTO=LL,LL128,Simple 16export NCCL_ALGO=Tree,Ring,CollnetDirect,CollnetChain,NVLS 17 18srun -N1 --exclusive --gpus-per-node 8 --mpi=pmix --container-image nvcr.io#nvidia/pytorch:24.09-py3 -p defq all_reduce_perf_mpi -b 1G -e 4G -f 2 -g 8
Make the script executable by running the following command:
chmod +x $HOME/run-sn.sh
Now you can run the script:
1cd $HOME 2./run-sn.sh
You should see output similar to the following example:
1pyxis: imported docker image: nvcr.io#nvidia/pytorch:24.09-py3 2# nThread 1 nGpus 8 minBytes 1073741824 maxBytes 4294967296 step: 2(factor) warmup iters: 5 iters: 20 agg iters: 1 validation: 1 graph: 0 3# 4# Using devices 5# Rank 0 Group 0 Pid 848629 on gpu008 device 0 [0x00] NVIDIA A100-SXM4-80GB 6# Rank 1 Group 0 Pid 848629 on gpu008 device 1 [0x00] NVIDIA A100-SXM4-80GB 7# Rank 2 Group 0 Pid 848629 on gpu008 device 2 [0x00] NVIDIA A100-SXM4-80GB 8# Rank 3 Group 0 Pid 848629 on gpu008 device 3 [0x00] NVIDIA A100-SXM4-80GB 9# Rank 4 Group 0 Pid 848629 on gpu008 device 4 [0x00] NVIDIA A100-SXM4-80GB 10# Rank 5 Group 0 Pid 848629 on gpu008 device 5 [0x00] NVIDIA A100-SXM4-80GB 11# Rank 6 Group 0 Pid 848629 on gpu008 device 6 [0x00] NVIDIA A100-SXM4-80GB 12# Rank 7 Group 0 Pid 848629 on gpu008 device 7 [0x00] NVIDIA A100-SXM4-80GB 13# 14# out-of-place in-place 15# size count type redop root time algbw busbw #wrong time algbw busbw #wrong 16# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s) 171073741824 268435456 float sum -1 8215.7 130.69 228.71 0 8214.6 130.71 228.75 0 182147483648 536870912 float sum -1 16274 131.95 230.92 0 16273 131.97 230.95 0 194294967296 1073741824 float sum -1 32231 133.25 233.20 0 33012 130.10 227.68 0 20# Out of bounds values : 0 OK 21# Avg bus bandwidth : 230.034 22#
4.3.3. Running a Multi-Node Job
The example below runs a multi-node variant of the GPU-based job above with the NCCL tests tool from an NGC container. For more information on multi-node jobs, refer to Multi-node Jobs in the NVIDIA DGX Cloud Cluster User Guide.
Create a script at
$HOME/run-mn.sh
using the text editor of your choice, with the following content:1#!/bin/bash 2 3export OMPI_MCA_coll_hcoll_enable=0 4export UCX_TLS=rc 5export UCX_NET_DEVICES=mlx5_0:1,mlx5_1:1,mlx5_2:1,mlx5_3:1,mlx5_4:1,mlx5_5:1,mlx5_6:1,mlx5_7:1 6export CUDA_DEVICE_ORDER=PCI_BUS_ID 7export NCCL_SOCKET_IFNAME=eth0 8export NCCL_IB_PCI_RELAXED_ORDERING=1 9export NCCL_TOPO_FILE=/cm/shared/etc/ndv4-topo.xml 10export NCCL_PROTO=LL,LL128,Simple 11export NCCL_ALGO=Tree,Ring,CollnetDirect,CollnetChain,NVLS 12export MELLANOX_VISIBLE_DEVICES=all 13export PMIX_MCA_gds=hash 14export PMIX_MCA_psec=native 15 16srun -N2 --exclusive --gpus-per-node 8 --mpi=pmix --container-image nvcr.io/nvidia/pytorch:24.09-py3 -p defq all_reduce_perf_mpi -b 1G -e 4G -f 2 -g 8
Make the script executable by running the following command:
chmod +x $HOME/run-mn.sh
Now you can run the script:
./run-mn.sh
You should see output similar to the following example:
1pyxis: imported docker image: nvcr.io#nvidia/pytorch:24.09-py3 2pyxis: imported docker image: nvcr.io#nvidia/pytorch:24.09-py3 3# nThread 1 nGpus 8 minBytes 1073741824 maxBytes 4294967296 step: 2(factor) warmup iters: 5 iters: 20 agg iters: 1 validation: 1 graph: 0 4# 5# Using devices 6# Rank 0 Group 0 Pid 824960 on gpu005 device 0 [0x00] NVIDIA A100-SXM4-80GB 7# Rank 1 Group 0 Pid 824960 on gpu005 device 1 [0x00] NVIDIA A100-SXM4-80GB 8# Rank 2 Group 0 Pid 824960 on gpu005 device 2 [0x00] NVIDIA A100-SXM4-80GB 9# Rank 3 Group 0 Pid 824960 on gpu005 device 3 [0x00] NVIDIA A100-SXM4-80GB 10# Rank 4 Group 0 Pid 824960 on gpu005 device 4 [0x00] NVIDIA A100-SXM4-80GB 11# Rank 5 Group 0 Pid 824960 on gpu005 device 5 [0x00] NVIDIA A100-SXM4-80GB 12# Rank 6 Group 0 Pid 824960 on gpu005 device 6 [0x00] NVIDIA A100-SXM4-80GB 13# Rank 7 Group 0 Pid 824960 on gpu005 device 7 [0x00] NVIDIA A100-SXM4-80GB 14# Rank 8 Group 0 Pid 822704 on gpu006 device 0 [0x00] NVIDIA A100-SXM4-80GB 15# Rank 9 Group 0 Pid 822704 on gpu006 device 1 [0x00] NVIDIA A100-SXM4-80GB 16# Rank 10 Group 0 Pid 822704 on gpu006 device 2 [0x00] NVIDIA A100-SXM4-80GB 17# Rank 11 Group 0 Pid 822704 on gpu006 device 3 [0x00] NVIDIA A100-SXM4-80GB 18# Rank 12 Group 0 Pid 822704 on gpu006 device 4 [0x00] NVIDIA A100-SXM4-80GB 19# Rank 13 Group 0 Pid 822704 on gpu006 device 5 [0x00] NVIDIA A100-SXM4-80GB 20# Rank 14 Group 0 Pid 822704 on gpu006 device 6 [0x00] NVIDIA A100-SXM4-80GB 21# Rank 15 Group 0 Pid 822704 on gpu006 device 7 [0x00] NVIDIA A100-SXM4-80GB 22# 23# out-of-place in-place 24# size count type redop root time algbw busbw #wrong time algbw busbw #wrong 25# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s) 261073741824 268435456 float sum -1 11214 95.75 179.53 0 11211 95.77 179.58 0 272147483648 536870912 float sum -1 21949 97.84 183.45 0 21629 99.29 186.17 0 284294967296 1073741824 float sum -1 44071 97.46 182.73 0 43494 98.75 185.15 0 29# Out of bounds values : 0 OK 30# Avg bus bandwidth : 182.768 31#
4.4. Accessing the User Portal
User Portal is a browser-based GUI designed specifically for cluster users so that they can have a dashboard of their own workloads in the cluster.
Refer to User Portal in the NVIDIA DGX Cloud Cluster User Guide for more information.
5. Troubleshooting
5.1. SSH Key Permissions
5.1.1. Unprotected Private Key File in WSL
You may see this error when trying to ssh to the head node when using WSL on Windows.
1@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 2@ WARNING: UNPROTECTED PRIVATE KEY FILE! @ 3@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4Permissions 0777 for '<ssh-key-file>' are too open. 5It is required that your private key files are NOT accessible by others. 6This private key will be ignored. 7Load key "<ssh-key-file>": bad permissions 8<cluster-user>@slogin001: Permission denied (publickey,gssapi-with-mic).
To fix this, you need to update your WSL conf to allow you to own and change the file permissions for the SSH private key:
Create an
/etc/wsl.conf
file (assudo
) with the following contents:1[automount] 2options="metadata"
Exit WSL
Terminate the instance via command prompt (
wsl --terminate <distro-name>
) or shut it down (wsl --shutdown
)Restart WSL
Then, from WSL, run the following command to change the permissions of the private key:
user@local:$HOME/.ssh$ chmod 600 <ssh-key-file>
Then, check the permissions:
user@local:$HOME/.ssh$ ls -l <ssh-key-file>
It should look like:
-rw------- 1 user local 2610 Apr 2 19:19 <ssh-key-file>
5.2. Base View Permissions
In general, cluster users should use the User Portal instead of Base View for a UI to utilize the cluster. If a cluster user with insufficient permissions tries to log in to Base View they will see an error similar to the following.
Base View is primarily intended for cluster admins. Cluster users should access the User Portal.