18.1. Pipeline Load Generator CLI (cpload)
cpload
is a CLI tool which allows to specify the pipeline(s) used to create the jobs, the data set(s) used as input for the jobs, and other options for controlling the number of jobs to create, the frequency at which to create them etc.
clara
CLI must be installed on the system to use cpload
CLI.
18.1.2.1.generate
subcommand
cpload generate [(-r|--recurse)] [(-t|--target) <target>] [(-p|--priority) <priority>] [(-n|--count) <number-of-jobs>] [(-s|--sequence)] [-w|--delay <delay-seconds>] <name> <pipeline-id> [-- <data-path>] [--stats]
Creates a new job using <pipeline-id>
with a priority of <priority>
and the name of <name>
. Pushes the contents of <data-path>
to <target>
as input for the new job. When multiple jobs are created the name of the job will be expanded to "cpload-<name>-<count>"
where count is the ordinal of the job created, starting at one.
<pipeline-id>
is expected to be the valid identity of a pipeline known by <target>
.
When -- <data-path>
is not provided, the job is assumed to have no payload requirement; and no data is uploaded to the payload prior to starting the job.
When <data-path>
is a directory, each child file is used as a separate job input. If <data-path>
contains sub-folders as well and (-r|--recurse)
is used, the directory is included recursively; if (-r|--recurse)
is not included, an error occurs and no work is generated.
When <data-path>
is a file and (-r|--recurse)
is provided, an error occurs.
<target>
is assumed to be localhost
, or the current remote set by cpload logon, unless (-t|--target) <target>
is provided.
<priority>
is assumed to be normal, unless (-p|--priority) <priority>
is provided. Valid priority values are low, normal, high, or immediate.
<number-of-jobs>
will be created on <target>
. <number-of-jobs>
is assumed to be 1 unless (-n|--count) <number-of-jobs>
is provided.
When <number-of-jobs>
is greater than 1, <data-path>
is a directory, and (-s|--sequence)
is provided, <data-path>
must only contain a series of folders which are named in sequence (i.e. data/1, data/2, data/3, etc.). If <data-path>
contains any folders which do not fit the sequence or contains any files, an error occurs.
The sub-folders will be assumed to be individual job inputs and they will be uploaded as such in a round-robin fashion. For example, if <data-path>
has three child folders and five jobs are being created the folders will be used as job input data in the following manner:
job 1 uses folder 1
job 2 uses folder 2
job 3 uses folder 3
job 4 uses folder 1
job 5 uses folder 2
When (-r|--recurse)
and (-s|--sequence)
are combined, the top-level directory must contain only folders which are named in sequence. Each sequence named sub-folder is then treated as an individual job input and read recursively (see above for how recursive directories are handled).
When <number-of-jobs>
is a non-numeric value or a value less than 1, an error occurs and no jobs are generated.
When <data-path>
is a file and is (-s|--sequence)
provided, an error occurs and no jobs are generated.
When <number-of-jobs>
is greater than 1, (-w|--delay)
can be used to provide a <delay-seconds>
value. <delay-seconds>
is assumed to be zero, unless (-w|--delay)
is provided. The <delay-seconds>
value is used to force the generator to wait at least <delay-seconds>
between sending jobs to <target>
.
When <delay-seconds>
is a non-numeric value or a value less than 0, an error occurs and no jobs are generated.
When --stats
is used, cpload collects job statistics and returns control after all jobs are in stopped state.
18.1.2.1.1.Basic Sample Usage
# create pipeline using clara CLI
$ clara create pipeline -p sample-pipeline.yaml
PIPELINE_ID: <sample-pipeline-id>
# dataset directory containing 3 job inputs
$ tree /path/to/my/dataset/dir
/path/to/my/dataset/dir
├── data1
│ └── input
├── data2
│ └── input
└── data3
└── input
# generate 5 jobs with 3 job inputs with no delay
# 5 jobs would use the 3 inputs in a round robin fashion
# Initial input selection is not guarenteed to be in any order, but the selected order is used in round robin fashion
$ cpload generate -r -n 5 sample-load-name <sample-pipeline-id> -- /path/to/my/dataset/dir
cpload-sample-load-name-1(2c0c0c737f8944ca853c9601fd8f1681): Job Start request submitted successfully with JOB_STATUS_HEALTHY
cpload-sample-load-name-2(69a88387e70f40f38a8d54b66095cffb): Job Start request submitted successfully with JOB_STATUS_HEALTHY
cpload-sample-load-name-3(8ee8c78f75f04db3a0d13f697d028bf0): Job Start request submitted successfully with JOB_STATUS_HEALTHY
cpload-sample-load-name-4(5ebd4c256f354a579ac23d2b9534167e): Job Start request submitted successfully with JOB_STATUS_HEALTHY
cpload-sample-load-name-5(5470bd9ada6f491d8f06f7e5ade4aa67): Job Start request submitted successfully with JOB_STATUS_HEALTHY
# generate 5 jobs with 3 job inputs with 2 seconds of delay between successive jobs
$ cpload generate -r -w 2 -n 5 sample-load-name <sample-pipeline-id> -- /path/to/my/dataset/dir
cpload-sample-load-name-1(2c0c0c737f8944ca853c9601fd8f1681): Job Start request submitted successfully with JOB_STATUS_HEALTHY
# 2 seconds of delay
cpload-sample-load-name-2(69a88387e70f40f38a8d54b66095cffb): Job Start request submitted successfully with JOB_STATUS_HEALTHY
# 2 seconds of delay
cpload-sample-load-name-3(8ee8c78f75f04db3a0d13f697d028bf0): Job Start request submitted successfully with JOB_STATUS_HEALTHY
# 2 seconds of delay
cpload-sample-load-name-4(5ebd4c256f354a579ac23d2b9534167e): Job Start request submitted successfully with JOB_STATUS_HEALTHY
# 2 seconds of delay
cpload-sample-load-name-5(5470bd9ada6f491d8f06f7e5ade4aa67): Job Start request submitted successfully with JOB_STATUS_HEALTHY
# dataset directory containing 3 sequential job inputs
$ tree /path/to/my/sequential/dataset/dir
/path/to/my/sequential/dataset/dir
├── 1
│ └── input
├── 2
│ └── input
└── 3
└── input
# generate 5 jobs with 3 "sequential" job inputs with 2 seconds of delay between successive jobs
# if "--sequence" flag is used, datasets must be sequential.
# A sequential datasets directory must only contain directories. These directories must be numbered as 1, 2, 3, and so on...
# Initial input selection is guarenteed to be in order(1, 2, 3 ...) and it is used in round robin fashion
$ cpload generate -r -w 2 -n 5 sample-load-name <sample-pipeline-id> -- /path/to/my/sequential/dataset/dir
# job with data in directory /path/to/my/sequential/dataset/dir/1 is launched
cpload-sample-load-name-1(2c0c0c737f8944ca853c9601fd8f1681): Job Start request submitted successfully with JOB_STATUS_HEALTHY
# 2 seconds of delay
# job with data in directory /path/to/my/sequential/dataset/dir/2 is launched
cpload-sample-load-name-2(69a88387e70f40f38a8d54b66095cffb): Job Start request submitted successfully with JOB_STATUS_HEALTHY
# 2 seconds of delay
# job with data in directory /path/to/my/sequential/dataset/dir/3 is launched
cpload-sample-load-name-3(8ee8c78f75f04db3a0d13f697d028bf0): Job Start request submitted successfully with JOB_STATUS_HEALTHY
# 2 seconds of delay
# job with data in directory /path/to/my/sequential/dataset/dir/1 is launched
cpload-sample-load-name-4(5ebd4c256f354a579ac23d2b9534167e): Job Start request submitted successfully with JOB_STATUS_HEALTHY
# 2 seconds of delay
# job with data in directory /path/to/my/sequential/dataset/dir/2 is launched
cpload-sample-load-name-5(5470bd9ada6f491d8f06f7e5ade4aa67): Job Start request submitted successfully with JOB_STATUS_HEALTHY
18.1.2.2.multi-generate
subcommand
cpload multi-generate [-a|-add <pipeline-id> <data-path>] [(-t|--target) <target>] [(-p|--priority) <priority>] [-w|--delay <delay-seconds>]] [--stats] <name>
Creates and starts a new job for each <pipeline-id>
, <data-path>
combination added. Each combination requires a separate -a
flag. At least one <pipeline-id>
, <data-path>
must be included.
The jobs have a priority of <priority>
. CPLoad pushes the contents of <data-path>
to <target>
as input for the new job. The name of the job will be to "cpload-<name>-<count>"
, where count is the number of each job created, starting at 1.
-a <pipeline-id>, <data-path>
is expected to be a valid pipeline ID and data directory or file to run through the pipeline.
<target>
is assumed to be localhost
, or the current remote set by cpload logon, unless (-t|--target) <target>
is provided.
<priority>
is assumed to be normal, unless (-p|--priority) <priority>
is provided. Valid priority values are low, normal, high, or immediate.
<delay-seconds>
is the delay between jobs creations and may be set to a non-negative number.
When --stats
is used, cpload collects job statistics and returns control after all jobs are in stopped state.
18.1.2.2.1.Basic Sample Usage
# create pipeline 1
$ clara create pipeline -p sample-pipeline-1.yaml
PIPELINE_ID: <sample-pipeline-id-1>
# create pipeline 2
$ clara create pipeline -p sample-pipeline-2.yaml
PIPELINE_ID: <sample-pipeline-id-2>
# generate 2 pipelines with delay of 2 seconds
$ cpload multi-generate -w 2 -a <sample-pipeline-id-1> <input-for-pipeline-1> -a <sample-pipeline-id-2> <input-for-pipeline-2> sample-multi-job-name
sample-multi-job-name-multi-1(480e5121417d481f9c434a336e867fbf): Job Start request submitted successfully with JOB_STATUS_HEALTHY
# 2 seconds of delay
sample-multi-job-name-multi-2(5a28b0813099422a9fcadf70c72dfe15): Job Start request submitted successfully with JOB_STATUS_HEALTHY
18.1.2.3.target
subcommand
cpload target <verb>
cpload target list
prints the list known targets.
cpload target set [(-f|--force)] <name> [<url>]
sets the current active remote target by <name>
.
If <name>
is unknown, then <url>
is required and an error if not provided. When <url>
is provided, it is associated with <name>
and stored for future use. If a target named <name>
exists, (-f|--force)
must be provided or an error occurs.
The value of <name>
must be at least a single character, no longer than 255 characters, cannot contain any white-space or control characters, does not start with a - character, and cannot contain any of the following characters:
! @ # $ % ^ & * ( ) < > = ? ' "
cpload target rm <name>
removes the target named <name>
. If no target named <name>
exists an error occurs.
18.1.2.3.1.Basic Sample Usage
# get list of targets
$ cpload target list
Active target: localhost
Target_Name |URL
-------------------------------
localhost |localhost
# set new target
$ cpload target set new-target-name <new-target-url>
target 'new-target-name' saved
Active target: 'new-target-name'
# get updated list of targets
$ cpload target list
Active target: new-target-name
Target_Name |URL
--------------------------------
localhost |localhost
new-target-name |<new-target-url>
# set localhost back as active target
$ cpload target set loacalhost
Active target: 'localhost'
# remove target from the list
$ cpload target rm new-target-name
target 'mnop-abd-pqr-jlnp' deleted
# get updated list of targets
$ cpload target list
Active target: localhost
Target_Name |URL
-------------------------------
localhost |localhost