morpheus.utils.downloader.Downloader#
- class Downloader(
- download_method=DownloadMethods.DASK_THREAD,
- dask_heartbeat_interval='30s',
Bases:
object- Downloads a list of
fsspec.core.OpenFilesfiles using one of the following methods: single_thread, dask or dask_thread
The download method can be passed in via the
download_methodparameter or via theMORPHEUS_FILE_DOWNLOAD_TYPEenvironment variable. If both are set, the environment variable takes precedence, by defaultdask_threadis used.When using single_thread,
daskanddask.distributedis not reuiqrred to be installed.- Parameters:
- download_methodtyping.Union[DownloadMethods, str], optional, default = DownloadMethods.DASK_THREAD
The download method to use, if the
MORPHEUS_FILE_DOWNLOAD_TYPEenvironment variable is set, it takes presedence.- dask_heartbeat_intervalstr, optional, default = “30s”
The heartbeat interval to use when using dask or dask_thread.
- Attributes:
download_methodReturn the download method.
Methods
close()Close the dask cluster if it exists.
download(download_buckets, download_fn)Download the files in
download_bucketsusing the method specified in the constructor.Construct a dask client using the cluster created by
get_dask_clusterGet the dask cluster used by this downloader.
- download(download_buckets, download_fn)[source]#
Download the files in
download_bucketsusing the method specified in the constructor. If dask or dask_thread is used, theget_dask_client_fnfunction is used to create a dask client, otherwise it is not called. If using one of the other methods this can be set to None.- Parameters:
- download_bucketstyping.Iterable[fsspec.core.OpenFiles]
Files to download
- download_fntyping.Callable[[fsspec.core.OpenFiles], pd.DataFrame]
Function used to download an individual file and return the contents as a pandas DataFrame
- Returns:
- typing.List[pd.DataFrame]
- get_dask_client()[source]#
Construct a dask client using the cluster created by
get_dask_cluster- Returns:
- dask.distributed.Client
- Downloads a list of