stages.text.download.arxiv.download
#
Module Contents#
Classes#
Downloads Arxiv data from s3://arxiv/src/ |
API#
- class stages.text.download.arxiv.download.ArxivDownloader(download_dir: str, verbose: bool = False)#
Bases:
nemo_curator.stages.text.download.DocumentDownloader
Downloads Arxiv data from s3://arxiv/src/
Initialization
Initialize the downloader.
Args: download_dir: Directory to store downloaded files verbose: If True, logs detailed download information