Machine‑Learning Operations (ais ml)
Machine‑Learning Operations (ais ml)
Machine‑Learning Operations (ais ml)
Introduced in v3.30, the ml namespace is intended for commands that target ML‑centric data
workflows — bulk extraction of training samples, cross‑bucket collation of model
artifacts, and manifest‑driven slicing of large corpora.
In this document:
ais ml get-batch – one‑shot consolidation of objects (and archived
sub‑objects) into a single TAR/TGZ/ZIP/TAR.LZ4.
ais ml lhotse-get-batch – higher‑level driver that reads Lhotse cut
manifests and spawns one or many get-batch jobs on your behalf.
Jump straight to the section you need:
--help is available at every level:
ais ml get-batchFetch one set of inputs—objects, archived files, or byte ranges—possibly
spanning multiple buckets and providers, and package them into a single
output archive.
Default format is TAR; pass the destination name with .tgz, .zip,
.tar.lz4, etc. to switch.
Typical uses:
--streaming).Full proto definition: api/apc/ml.go.
ais ml lhotse-get-batchConsumes a Lhotse cuts.jsonl[.gz | .lz4] manifest and spawns one or many
get-batch transactions. Ideal for speech/ASR pipelines where a manifest
describes thousands of time‑offsets across many recordings.
Key distinction:
In a Lhotse manifest, each cut JSON line lists one or more recording sources (URI + byte/time offsets).
Manifests may be:
cuts.jsonlcuts.jsonl.gz or cuts.jsonl.gzipcuts.jsonl.lz4Each line is an independent cut. See the Lhotse docs for the full schema.
Remaining flags (--list, --template, --prefix, --omit-src-bck, --streaming, --cont-on-err, --yes, --nv) behave exactly like in get-batch.
Streaming (--streaming) vs multipart (buffered)
The flag only toggles how AIStore emits the response — nothing changes in the CLI itself.
Leaving bucket / provider blank in a spec entry
ais ml get-batch ais://my‑bck --spec ….
Any entry with an empty bucket field inherits my‑bck.provider is omitted the cluster assumes ais://. For S3, GCS, Azure, OCI, etc., always spell out "provider":"s3", "gcp", … per entry.Manifests at scale. lhotse-get-batch --batch-size divides the manifest
on the client side—no need to pre‑split the file.
Check specs into Git. Treat JSON/YAML batch specs as immutable artifacts for reproducibility.
apc/ml.goais ml lhotse-get-batch (WIP)ais search