lep job
Manages Lepton Jobs.
Lepton Jobs are for one-time and one-off tasks that run on one or more machines. For example, one can launch a shell script that does a bunch of data processing as a job, or a distributed ML training job over multiple, connected machines. See the documentation for more details.
Usage
lep job [OPTIONS] COMMAND [ARGS]...
Options
--help
: Show this message and exit.
Commands
clone
: Creates a copy of an existing job by its ID.create
: Creates a job.events
: Prints the events of a job by its ID.get
: Gets detailed information about jobs.list
: Lists all jobs in the current workspace.log
: Gets the log of a job.remove
: Removes a single job.remove-all
: Removes all jobs matching the specified filters.replicas
: Prints the replicas id of a job.start
: Starts a job by its ID.stop
: Stops a job by its ID.stop-all
: Stop all jobs matching the specified filters.
lep job create
Creates a job.
For advanced uses, check https://kubernetes.io/docs/concepts/workloads/controllers/job/.
Usage
lep job create [OPTIONS]
Options
-n
,--name TEXT
: Job name [required]-f
,--file TEXT
: If specified, load the job spec from the file. Any explicitly passed in arg will update the spec based on the file.--container-image TEXT
: Container image for the job. If not set, default to leptonai.config.BASE_IMAGE--container-port TEXT
: Ports to expose for the job, in the format portnumber[:protocol].--port TEXT
: Deprecated flag, use --container-port instead.--command TEXT
: Command string to run for the job.--resource-shape TEXT
: Resource shape for the pod. Available types are: 'cpu.small', 'cpu.medium', 'cpu.large', 'gpu.a10', 'gpu.a10.6xlarge', 'gpu.a100-40gb', 'gpu.2xa100-40gb', 'gpu.4xa100-40gb', 'gpu.8xa100-40gb', 'gpu.a100-80gb', 'gpu.2xa100-80gb', 'gpu.4xa100-80gb', 'gpu.8xa100-80gb', 'gpu.h100-sxm', 'gpu.2xh100-sxm', 'gpu.4xh100-sxm', 'gpu.8xh100-sxm'.-w
,--num-workers INTEGER
: Number of workers to use for the job. For example, when you do a distributed training job of 4 replicas, use --num-workers 4.--max-failure-retry INTEGER
: Maximum number of failures to retry per worker.--max-job-failure-retry INTEGER
: Maximum number of failures to retry per whole job.-e
,--env TEXT
: Environment variables to pass to the job, in the formatNAME=VALUE
.-s
,--secret TEXT
: Secrets to pass to the job, in the formatNAME=SECRET_NAME
. If secret name is also the environment variable name, you can omit it and simply passSECRET_NAME
.--mount TEXT
: Persistent storage to be mounted to the job, in the formatSTORAGE_PATH:MOUNT_PATH
orSTORAGE_PATH:MOUNT_PATH:MOUNT_FROM
.--image-pull-secrets TEXT
: Secrets to use for pulling images.--intra-job-communication BOOLEAN
: Enable intra-job communication. If --num-workers is set, this is automatically enabled.--privileged
: Run the job in privileged mode.--ttl-seconds-after-finished INTEGER
: (advanced feature) limits the lifetime of a job that has finished execution (either Completed or Failed). If not set, we will have it default to 72 hours. Ref: https://kubernetes.io/docs/concepts/workloads/controllers/job/#ttl-mechanism-for-finished-jobs-lg
,--log-collection BOOLEAN
: Enable or disable log collection (true/false). If not provided, the workspace setting will be used.-ng
,--node-group TEXT
: Node group for the job. If not set, use on-demand resources. You can repeat this flag multiple times to choose multiple node groups. Multiple node group option is currently not supported but coming soon for enterprise users. Only the first node group will be set if you input multiple node groups at this time.-ni
,--node-id TEXT
: Node for the job. You can repeat this flag multiple times to choose multiple nodes. Please specify the node group when you are using this option-qp
,--queue-priority TEXT
: Set the priority for this job (feature available only for dedicated node groups). Could be one of low-1, low-2, low-3, medium-4, medium-5, medium-6, high-7, high-8, high-9,Options: 1-9 or keywords: l / low (will be 1), m / medium (will be 4), h / high (will be 7). Examples: -qp 1, -qp 9, -qp low, -qp medium, -qp high, -qp l, -qp m, -qp h--visibility TEXT
: Visibility of the job. Can be 'public' or 'private'. If private, the job will only be viewable by the creator and workspace admin.--shared-memory-size INTEGER
: Specify the shared memory size for this job, in MiB.--with-reservation TEXT
: Assign the job to a specific reserved compute resource using a reservation ID (only applicable to dedicated node groups). If not provided, the job will be scheduled as usual.--help
: Show this message and exit.
lep job list
Lists all jobs in the current workspace.
You can filter jobs by: - State: Case-insensitive prefix match (e.g., 'run' matches 'Running') - User: Case-insensitive prefix match (e.g., 'alice' matches 'alice123') - Name/ID: Case-insensitive substring match (e.g., 'train' matches 'training-job-123') - Node Group: Case-insensitive substring match
Multiple filters can be combined. For example: lep job list -s queue -u alice -n train -ng h100
Usage
lep job list [OPTIONS]
Options
-s
,--state TEXT
: Filter jobs by state. Case-insensitive and matches the beginning of the state name. Available states: Starting, Running, Failed, Completed, Stopped, Stopping, Deleting, Deleted, Restarting, Archived, Queueing, Awaiting, PendingRetry. Example: 'run' will match 'Running'. Can specify multiple states.-u
,--user TEXT
: Filter jobs by user. Case-insensitive and matches the beginning of the username. Can specify multiple users. Example: 'alice' will match 'alice123'-n
,--name-or-id TEXT
: Filter jobs by name or id. Case-insensitive and matches any part of the name or id. Can specify multiple names or ids. Example: 'train' will match 'training-job-123'-ng
,--node-group TEXT
: Filter jobs by node group. Case-insensitive and matches any part of the node group name.--help
: Show this message and exit.
lep job remove-all
Removes all jobs matching the specified filters. At least one filter must be provided. For safety, name and user filters require exact matches. State filter remains flexible. The --user option is required to prevent accidental operations on other users' jobs.
Usage
lep job remove-all [OPTIONS]
Options
-s
,--state TEXT
: Filter jobs by state. Case-insensitive and matches the beginning of the state name. Available states: Starting, Running, Failed, Completed, Stopped, Stopping, Deleting, Deleted, Restarting, Archived, Queueing, Awaiting, PendingRetry. Example: 'run' will match 'Running'. Can specify multiple states.-u
,--user TEXT
: Filter jobs by exact username match. Case-sensitive. Can specify multiple users. For safety, this is an exact match. This option is required to prevent accidental operations on other users' jobs. [required]-n
,--name TEXT
: Filter jobs by exact name match. Case-sensitive. Can specify multiple names. For safety, this is an exact match.-ng
,--node-group TEXT
: Filter jobs by node group. Case-insensitive and matches any part of the node group name.--help
: Show this message and exit.
lep job stop-all
Stop all jobs matching the specified filters. At least one filter must be provided. For safety, name and user filters require exact matches. State filter remains flexible. The --user option is required to prevent accidental operations on other users' jobs.
Usage
lep job stop-all [OPTIONS]
Options
-s
,--state TEXT
: Filter jobs by state. Case-insensitive and matches the beginning of the state name. Available states: Starting, Running, Failed, Completed, Stopped, Stopping, Deleting, Deleted, Restarting, Archived, Queueing, Awaiting, PendingRetry. Example: 'run' will match 'Running'. Can specify multiple states.-u
,--user TEXT
: Filter jobs by exact username match. Case-sensitive. Can specify multiple users. For safety, this is an exact match. This option is required to prevent accidental operations on other users' jobs. [required]-n
,--name TEXT
: Filter jobs by exact name match. Case-sensitive. Can specify multiple names. For safety, this is an exact match.-ng
,--node-group TEXT
: Filter jobs by node group. Case-insensitive and matches any part of the node group name.--help
: Show this message and exit.
lep job get
Gets detailed information about jobs.
You can search by either name or id: - If searching by name, returns all jobs with that exact name - If searching by id, returns the specific job with that id
Args: name: Job name to search for (exact match) id: Job id to search for (exact match)
Usage
lep job get [OPTIONS]
Options
-n
,--name TEXT
: Job name-i
,--id TEXT
: Job id--help
: Show this message and exit.
lep job remove
Removes a single job.
You can remove a job by either name or id: - If removing by name, only the newest job with that exact name will be removed - If removing by id, the specific job with that id will be removed
For removing multiple jobs with the same name, use 'lep job remove-all' instead.
Args: id: Job id to remove (exact match) name: Job name to remove (exact match, removes only the newest matching job)
Usage
lep job remove [OPTIONS]
Options
-i
,--id TEXT
: The ID of the job to remove.-n
,--name TEXT
: The name of the job to remove. If multiple jobs share the same name, all of them will be removed.--help
: Show this message and exit.
lep job clone
Creates a copy of an existing job by its ID.
The cloned job will: - Have the same configuration as the original job - Have a new name with '-clone' suffix
Args: id: ID of the job to clone
Usage
lep job clone [OPTIONS]
Options
-i
,--id TEXT
: The job id to get events. [required]--help
: Show this message and exit.
lep job log
Gets the log of a job. If replica
is not specified, the first replica is
selected. Otherwise, the log of the specified replica is shown. To get the
list of replicas, use lep job status
.
Usage
lep job log [OPTIONS]
Options
-i
,--id TEXT
: The job id to get log. [required]-r
,--replica TEXT
: The replica name to get log.--help
: Show this message and exit.
lep job replicas
Prints the replicas id of a job.
Usage
lep job replicas [OPTIONS]
Options
-i
,--id TEXT
: The job id to get replicas. [required]--help
: Show this message and exit.
lep job stop
Stops a job by its ID.
Args: id: ID of the job to stop
Usage
lep job stop [OPTIONS]
Options
-i
,--id TEXT
: The job id to stop. [required]--help
: Show this message and exit.
lep job start
Starts a job by its ID.
Args: id: ID of the job to start
Usage
lep job start [OPTIONS]
Options
-i
,--id TEXT
: The job id to start. [required]--help
: Show this message and exit.
lep job events
Prints the events of a job by its ID.
Args: id: ID of the job to get events
Usage
lep job events [OPTIONS]
Options
-i
,--id TEXT
: The job id to get events. [required]--help
: Show this message and exit.