nemo_curator.stages.text.download.common_crawl.warc_iterator
nemo_curator.stages.text.download.common_crawl.warc_iterator
Module Contents
Classes
API
Bases: DocumentIterator
Processes downloaded WARC files.
Process a task containing WARC files and extract their contents.