lep job

Manages Lepton Jobs.

Lepton Jobs are for one-time and one-off tasks that run on one or more machines. For example, one can launch a shell script that does a bunch of data processing as a job, or a distributed ML training job over multiple, connected machines. See the documentation for more details.

Usage

lep job [OPTIONS] COMMAND [ARGS]...

Options

  • --help : Show this message and exit.

Commands

  • clone : Creates a copy of an existing job by its ID.
  • create : Creates a job.
  • events : Prints the events of a job by its ID.
  • get : Gets detailed information about jobs.
  • list : Lists all jobs in the current workspace.
  • log : Gets the log of a job.
  • remove : Removes a single job.
  • remove-all : Removes all jobs matching the specified filters.
  • replicas : Prints the replicas id of a job.
  • start : Starts a job by its ID.
  • stop : Stops a job by its ID.
  • stop-all : Stop all jobs matching the specified filters.

lep job create

Creates a job.

For advanced uses, check https://kubernetes.io/docs/concepts/workloads/controllers/job/.

Usage

lep job create [OPTIONS]

Options

  • -n, --name TEXT : Job name [required]
  • -f, --file TEXT : If specified, load the job spec from the file. Any explicitly passed in arg will update the spec based on the file.
  • --container-image TEXT : Container image for the job. If not set, default to leptonai.config.BASE_IMAGE
  • --container-port TEXT : Ports to expose for the job, in the format portnumber[:protocol].
  • --port TEXT : Deprecated flag, use --container-port instead.
  • --command TEXT : Command string to run for the job.
  • --resource-shape TEXT : Resource shape for the pod. Available types are: 'cpu.small', 'cpu.medium', 'cpu.large', 'gpu.a10', 'gpu.a10.6xlarge', 'gpu.a100-40gb', 'gpu.2xa100-40gb', 'gpu.4xa100-40gb', 'gpu.8xa100-40gb', 'gpu.a100-80gb', 'gpu.2xa100-80gb', 'gpu.4xa100-80gb', 'gpu.8xa100-80gb', 'gpu.h100-sxm', 'gpu.2xh100-sxm', 'gpu.4xh100-sxm', 'gpu.8xh100-sxm'.
  • -w, --num-workers INTEGER : Number of workers to use for the job. For example, when you do a distributed training job of 4 replicas, use --num-workers 4.
  • --max-failure-retry INTEGER : Maximum number of failures to retry per worker.
  • --max-job-failure-retry INTEGER : Maximum number of failures to retry per whole job.
  • -e, --env TEXT : Environment variables to pass to the job, in the format NAME=VALUE.
  • -s, --secret TEXT : Secrets to pass to the job, in the format NAME=SECRET_NAME. If secret name is also the environment variable name, you can omit it and simply pass SECRET_NAME.
  • --mount TEXT : Persistent storage to be mounted to the job, in the format STORAGE_PATH:MOUNT_PATH or STORAGE_PATH:MOUNT_PATH:MOUNT_FROM.
  • --image-pull-secrets TEXT : Secrets to use for pulling images.
  • --intra-job-communication BOOLEAN : Enable intra-job communication. If --num-workers is set, this is automatically enabled.
  • --privileged : Run the job in privileged mode.
  • --ttl-seconds-after-finished INTEGER : (advanced feature) limits the lifetime of a job that has finished execution (either Completed or Failed). If not set, we will have it default to 72 hours. Ref: https://kubernetes.io/docs/concepts/workloads/controllers/job/#ttl-mechanism-for-finished-jobs
  • -lg, --log-collection BOOLEAN : Enable or disable log collection (true/false). If not provided, the workspace setting will be used.
  • -ng, --node-group TEXT : Node group for the job. If not set, use on-demand resources. You can repeat this flag multiple times to choose multiple node groups. Multiple node group option is currently not supported but coming soon for enterprise users. Only the first node group will be set if you input multiple node groups at this time.
  • -ni, --node-id TEXT : Node for the job. You can repeat this flag multiple times to choose multiple nodes. Please specify the node group when you are using this option
  • -qp, --queue-priority TEXT : Set the priority for this job (feature available only for dedicated node groups). Could be one of low-1, low-2, low-3, medium-4, medium-5, medium-6, high-7, high-8, high-9,Options: 1-9 or keywords: l / low (will be 1), m / medium (will be 4), h / high (will be 7). Examples: -qp 1, -qp 9, -qp low, -qp medium, -qp high, -qp l, -qp m, -qp h
  • --visibility TEXT : Visibility of the job. Can be 'public' or 'private'. If private, the job will only be viewable by the creator and workspace admin.
  • --shared-memory-size INTEGER : Specify the shared memory size for this job, in MiB.
  • --with-reservation TEXT : Assign the job to a specific reserved compute resource using a reservation ID (only applicable to dedicated node groups). If not provided, the job will be scheduled as usual.
  • --help : Show this message and exit.

lep job list

Lists all jobs in the current workspace.

You can filter jobs by: - State: Case-insensitive prefix match (e.g., 'run' matches 'Running') - User: Case-insensitive prefix match (e.g., 'alice' matches 'alice123') - Name/ID: Case-insensitive substring match (e.g., 'train' matches 'training-job-123') - Node Group: Case-insensitive substring match

Multiple filters can be combined. For example: lep job list -s queue -u alice -n train -ng h100

Usage

lep job list [OPTIONS]

Options

  • -s, --state TEXT : Filter jobs by state. Case-insensitive and matches the beginning of the state name. Available states: Starting, Running, Failed, Completed, Stopped, Stopping, Deleting, Deleted, Restarting, Archived, Queueing, Awaiting, PendingRetry. Example: 'run' will match 'Running'. Can specify multiple states.
  • -u, --user TEXT : Filter jobs by user. Case-insensitive and matches the beginning of the username. Can specify multiple users. Example: 'alice' will match 'alice123'
  • -n, --name-or-id TEXT : Filter jobs by name or id. Case-insensitive and matches any part of the name or id. Can specify multiple names or ids. Example: 'train' will match 'training-job-123'
  • -ng, --node-group TEXT : Filter jobs by node group. Case-insensitive and matches any part of the node group name.
  • --help : Show this message and exit.

lep job remove-all

Removes all jobs matching the specified filters. At least one filter must be provided. For safety, name and user filters require exact matches. State filter remains flexible. The --user option is required to prevent accidental operations on other users' jobs.

Usage

lep job remove-all [OPTIONS]

Options

  • -s, --state TEXT : Filter jobs by state. Case-insensitive and matches the beginning of the state name. Available states: Starting, Running, Failed, Completed, Stopped, Stopping, Deleting, Deleted, Restarting, Archived, Queueing, Awaiting, PendingRetry. Example: 'run' will match 'Running'. Can specify multiple states.
  • -u, --user TEXT : Filter jobs by exact username match. Case-sensitive. Can specify multiple users. For safety, this is an exact match. This option is required to prevent accidental operations on other users' jobs. [required]
  • -n, --name TEXT : Filter jobs by exact name match. Case-sensitive. Can specify multiple names. For safety, this is an exact match.
  • -ng, --node-group TEXT : Filter jobs by node group. Case-insensitive and matches any part of the node group name.
  • --help : Show this message and exit.

lep job stop-all

Stop all jobs matching the specified filters. At least one filter must be provided. For safety, name and user filters require exact matches. State filter remains flexible. The --user option is required to prevent accidental operations on other users' jobs.

Usage

lep job stop-all [OPTIONS]

Options

  • -s, --state TEXT : Filter jobs by state. Case-insensitive and matches the beginning of the state name. Available states: Starting, Running, Failed, Completed, Stopped, Stopping, Deleting, Deleted, Restarting, Archived, Queueing, Awaiting, PendingRetry. Example: 'run' will match 'Running'. Can specify multiple states.
  • -u, --user TEXT : Filter jobs by exact username match. Case-sensitive. Can specify multiple users. For safety, this is an exact match. This option is required to prevent accidental operations on other users' jobs. [required]
  • -n, --name TEXT : Filter jobs by exact name match. Case-sensitive. Can specify multiple names. For safety, this is an exact match.
  • -ng, --node-group TEXT : Filter jobs by node group. Case-insensitive and matches any part of the node group name.
  • --help : Show this message and exit.

lep job get

Gets detailed information about jobs.

You can search by either name or id: - If searching by name, returns all jobs with that exact name - If searching by id, returns the specific job with that id

Args: name: Job name to search for (exact match) id: Job id to search for (exact match)

Usage

lep job get [OPTIONS]

Options

  • -n, --name TEXT : Job name
  • -i, --id TEXT : Job id
  • --help : Show this message and exit.

lep job remove

Removes a single job.

You can remove a job by either name or id: - If removing by name, only the newest job with that exact name will be removed - If removing by id, the specific job with that id will be removed

For removing multiple jobs with the same name, use 'lep job remove-all' instead.

Args: id: Job id to remove (exact match) name: Job name to remove (exact match, removes only the newest matching job)

Usage

lep job remove [OPTIONS]

Options

  • -i, --id TEXT : The ID of the job to remove.
  • -n, --name TEXT : The name of the job to remove. If multiple jobs share the same name, all of them will be removed.
  • --help : Show this message and exit.

lep job clone

Creates a copy of an existing job by its ID.

The cloned job will: - Have the same configuration as the original job - Have a new name with '-clone' suffix

Args: id: ID of the job to clone

Usage

lep job clone [OPTIONS]

Options

  • -i, --id TEXT : The job id to get events. [required]
  • --help : Show this message and exit.

lep job log

Gets the log of a job. If replica is not specified, the first replica is selected. Otherwise, the log of the specified replica is shown. To get the list of replicas, use lep job status.

Usage

lep job log [OPTIONS]

Options

  • -i, --id TEXT : The job id to get log. [required]
  • -r, --replica TEXT : The replica name to get log.
  • --help : Show this message and exit.

lep job replicas

Prints the replicas id of a job.

Usage

lep job replicas [OPTIONS]

Options

  • -i, --id TEXT : The job id to get replicas. [required]
  • --help : Show this message and exit.

lep job stop

Stops a job by its ID.

Args: id: ID of the job to stop

Usage

lep job stop [OPTIONS]

Options

  • -i, --id TEXT : The job id to stop. [required]
  • --help : Show this message and exit.

lep job start

Starts a job by its ID.

Args: id: ID of the job to start

Usage

lep job start [OPTIONS]

Options

  • -i, --id TEXT : The job id to start. [required]
  • --help : Show this message and exit.

lep job events

Prints the events of a job by its ID.

Args: id: ID of the job to get events

Usage

lep job events [OPTIONS]

Options

  • -i, --id TEXT : The job id to get events. [required]
  • --help : Show this message and exit.
Copyright @ 2025, NVIDIA Corporation.