morpheus.utils.downloader.Downloader
- class Downloader(download_method=DownloadMethods.DASK_THREAD, dask_heartbeat_interval='30s')[source]
Bases:
object
- Downloads a list of
fsspec.core.OpenFiles
files using one of the following methods: single_thread, dask or dask_thread
The download method can be passed in via the
download_method
parameter or via theMORPHEUS_FILE_DOWNLOAD_TYPE
environment variable. If both are set, the environment variable takes precedence, by defaultdask_thread
is used.When using single_thread,
dask
anddask.distributed
is not reuiqrred to be installed.- Parameters
- download_methodtyping.Union[DownloadMethods, str], optional, default = DownloadMethods.DASK_THREAD
The download method to use, if the
MORPHEUS_FILE_DOWNLOAD_TYPE
environment variable is set, it takes presedence.- dask_heartbeat_intervalstr, optional, default = “30s”
The heartbeat interval to use when using dask or dask_thread.
- Attributes
download_method
Return the download method.
Methods
close
()Close the dask cluster if it exists. download
(download_buckets, download_fn)Download the files in download_buckets
using the method specified in the constructor.get_dask_client
()Construct a dask client using the cluster created by get_dask_cluster
get_dask_cluster
()Get the dask cluster used by this downloader. - close()[source]
Close the dask cluster if it exists.
- download(download_buckets, download_fn)[source]
Download the files in
download_buckets
using the method specified in the constructor. If dask or dask_thread is used, theget_dask_client_fn
function is used to create a dask client, otherwise it is not called. If using one of the other methods this can be set to None.- Parameters
- download_bucketstyping.Iterable[fsspec.core.OpenFiles]
Files to download
- download_fntyping.Callable[[fsspec.core.OpenFiles], pd.DataFrame]
Function used to download an individual file and return the contents as a pandas DataFrame
- Returns
- typing.List[pd.DataFrame]
- property download_method: str
Return the download method.
- get_dask_client()[source]
Construct a dask client using the cluster created by
get_dask_cluster
- Returns
- dask.distributed.Client
- get_dask_cluster()[source]
Get the dask cluster used by this downloader. If the cluster does not exist, it is created.
- Returns
- dask.distributed.LocalCluster
- Downloads a list of