nemo_curator.stages.text.download.arxiv.download

View as Markdown

Module Contents

Classes

NameDescription
ArxivDownloaderDownloads Arxiv data from s3://arxiv/src/

API

class nemo_curator.stages.text.download.arxiv.download.ArxivDownloader(
download_dir: str,
verbose: bool = False
)

Bases: DocumentDownloader

Downloads Arxiv data from s3://arxiv/src/

nemo_curator.stages.text.download.arxiv.download.ArxivDownloader._download_to_path(
url: str,
path: str
) -> tuple[bool, str | None]
nemo_curator.stages.text.download.arxiv.download.ArxivDownloader._get_output_filename(
url: str
) -> str