nemo_curator.stages.text.classifiers.domain
nemo_curator.stages.text.classifiers.domain
Module Contents
Classes
Data
MULTILINGUAL_DOMAIN_MODEL_IDENTIFIER
API
Bases: DistributedDataClassifier
DomainClassifier is a specialized classifier designed for English text domain classification tasks, utilizing the NemoCurator Domain Classifier (https://huggingface.co/nvidia/domain-classifier) model. This classifier is optimized for running on multi-node, multi-GPU setups to enable fast and efficient inference on large datasets.
Bases: DistributedDataClassifier
MultilingualDomainClassifier is a specialized classifier designed for domain classification tasks, utilizing the NemoCurator Multilingual Domain Classifier (https://huggingface.co/nvidia/multilingual-domain-classifier) model. It supports domain classification across 52 languages. This classifier is optimized for running on multi-node, multi-GPU setups to enable fast and efficient inference on large datasets.