Start, Stop, and monitor downloads
AIS Downloader is intended for downloading massive numbers of files (objects) and datasets from both Cloud Storage (buckets) and Internet.
Here’s the command’s help as of v3.30:
For details and background, please see the downloader’s own readme.
Table of Contents
- Start download job
- Stop download job
- Remove download job
- Show download jobs and job status
- Wait for download job
Start download job
ais start download SOURCE DESTINATION
or, same:
ais start download SOURCE DESTINATION
Download the object(s) from SOURCE location and saves it as specified in DESTINATION location.
SOURCE location can be a link to single or range download:
gs://lpr-vision/imagenet/imagenet_train-000000.tgz"gs://lpr-vision/imagenet/imagenet_train-{000000..000140}.tgz"
Currently, the schemas supported for SOURCE location are:
ais://- refers to AIS cluster. IP address and port number of the cluster’s proxy should follow the protocol. If port number is omitted, “8080” is used. E.g,ais://172.67.50.120:8080/bucket/imagenet_train-{0..100}.tgz. Can be used to copy objects between buckets of the same cluster, or to download objects from any remote AIS clusteraws://ors3://- refers to Amazon Web Services S3 storage, eg.s3://bucket/sub_folder/object_name.tarazure://oraz://- refers to Azure Blob Storage, eg.az://bucket/sub_folder/object_name.targcp://orgs://- refers to Google Cloud Storage, eg.gs://bucket/sub_folder/object_name.tarhttp://orhttps://- refers to external link somewhere on the web, eg.http://releases.ubuntu.com/18.04.1/ubuntu-18.04.1-desktop-amd64.iso
As for DESTINATION location should be in form schema://bucket/sub_folder/object_name:
schema://- schema specifying the provider of the destination bucket (ais://,aws://,azure://,gcp://)bucket- bucket name where the object(s) will be storedsub_folder/object_name- in case of downloading a single file, this will be the name of the object saved in AIS cluster.
If the DESTINATION bucket doesn’t exist, a new bucket with the default properties (as defined by the global configuration) will be automatically created.
Options
Examples
Download single file
Download object ubuntu-18.04.1-desktop-amd64.iso from the specified HTTP location and saves it in ubuntu bucket, named as ubuntu-18.04.1.iso
Download range of files from GCP
Download all objects in the range from gs://lpr-vision/imagenet/imagenet_train-000000.tgz to gs://lpr-vision/imagenet/imagenet_train-000140.tgz and saves them in local-lpr bucket, inside imagenet subdirectory.
Errors may happen during the download. Downloader logs and persists all errors, so they can be easily accessed during and after the run.
The job details are also accessible after the job finishes (or when it has been aborted).
Download range of files from GCP with limited connections
Download all objects in the range from gs://lpr-vision/imagenet/imagenet_train-000000.tgz to gs://lpr-vision/imagenet/imagenet_train-000140.tgz and saves them in local-lpr bucket, inside imagenet subdirectory.
Since each target can make only 1 concurrent connection we only see 4 files being downloaded (started on a cluster with 4 targets).
Download range of files from another AIS cluster
Download all objects from another AIS cluster (172.100.10.10:8080), from bucket imagenet in the range from imagenet_train-0022 to imagenet_train-0140 and saves them on the local AIS cluster into local-lpr bucket, inside set_1 subdirectory.
Download whole GCP bucket
Download all objects contained in gcp://lpr-vision bucket and save them into the lpr-vision-copy AIS bucket.
Note that this feature is only available when ais://lpr-vision-copy is connected to backend cloud bucket gcp://lpr-vision.
Sync whole GCP bucket
There are times when we suspect or know that the content of the cloud bucket that we previously downloaded has changed.
By default, the downloader just downloads new objects or updates the outdated ones, and it doesn’t check if the cached objects are no present in the cloud.
To change this behavior, you can specify --sync flag to enforce downloader to remove cached objects which are no longer present in the cloud.
Job starting, stopping (i.e., aborting), and monitoring commands all have equivalent shorter versions. For instance
ais start downloadcan be expressed asais start download, whileais wait download Z8WkHxwIrris the same asais wait Z8WkHxwIrr.
Download GCP bucket objects with prefix
Download objects contained in gcp://lpr-vision bucket which start with dir/prefix- and save them into the lpr-vision-copy AIS bucket.
Note that this feature is only available when ais://lpr-vision-copy is connected to backend cloud bucket gcp://lpr-vision.
Download multiple objects from GCP
Download all objects contained in objects.txt file.
The source and each object name from the file are concatenated (with /) to get full link to the external object.
Stop download job
ais stop download JOB_ID
Stop download job with given JOB_ID.
Remove download job
ais job rm download JOB_ID
Remove the finished download job with given JOB_ID from the job list.
Show download jobs and job status
ais show job download [JOB_ID]
Show download jobs or status of a specific job.
Options
Examples
Show progress of given download job
Show progress bars for each currently downloading file with refresh rate of 500 ms.
Show download job which description match given regex
Show all download jobs with descriptions starting with download prefix.
Wait for download job
ais wait download JOB_ID
Wait for the download job with given JOB_ID to finish.