For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI Reference
DocumentationAPI Reference
  • Home
    • Welcome
  • About NeMo Curator
    • Overview
    • Key Features
  • Get Started
    • Overview
    • Install (All Modalities)
    • Text Quickstart
    • Image Quickstart
    • Video Quickstart
    • Audio Quickstart
  • Curate Text
    • Overview
    • Tutorials
      • Overview
        • Overview
        • vLLM Embedder
    • Save and Export
  • Curate Images
    • Overview
    • Save and Export
  • Curate Video
    • Overview
    • Load Data
    • Save and Export
  • Curate Audio
    • Overview
    • Save and Export
  • Setup & Deployment
    • Overview
  • Reference
    • Overview
    • Related Tools
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Curator
On this page
  • How It Works
  • Choosing an Embedding Backend
  • Quick Start
  • EmbeddingCreatorStage
  • VLLMEmbeddingModelStage (Recommended for Semantic Deduplication)
  • Available Embedding Tools
  • Integration with Semantic Deduplication
Curate TextProcess DataEmbeddings

Text Embedding

||View as Markdown|
Previous

Text Cleaning

Next

vLLM Embedder

Generate text embeddings for large-scale datasets using NeMo Curator’s built-in embedding stages. Text embeddings enable downstream tasks such as semantic deduplication, similarity search, and clustering.

How It Works

NeMo Curator provides three embedding backends for text data, each suited to different model sizes and throughput requirements:

  1. EmbeddingCreatorStage — A composite stage that handles tokenization and embedding in sequence. Supports both Sentence Transformers’ SentenceTransformer and Hugging Face’s AutoModel classes via the use_sentence_transformer flag.
  2. VLLMEmbeddingModelStage — A standalone stage that uses vLLM for GPU-accelerated embedding generation with optional pretokenization. Best for large embedding models where vLLM’s batching and GPU utilization provide significant throughput gains.
  3. SentenceTransformerEmbeddingModelStage — A model stage that uses the sentence-transformers library directly. Used internally by EmbeddingCreatorStage when use_sentence_transformer=True.

Choosing an Embedding Backend

BackendBest ForGPU UtilizationSetup
EmbeddingCreatorStage (Sentence Transformers)Small to medium models (e.g., all-MiniLM-L6-v2)GoodIncluded in text_cuda12 extra
VLLMEmbeddingModelStageLarge models (e.g., google/embeddinggemma-300m) and semantic deduplicationExcellentIncluded in text_cuda12 extra
EmbeddingCreatorStage (AutoModel)Custom pooling strategiesGoodSet use_sentence_transformer=False

Benchmarks on 5 GB of Common Crawl data show that vLLM outperforms Sentence Transformers for larger embedding models, while Sentence Transformers is faster for smaller models. The vLLM pretokenize mode provides the best per-task throughput across both model sizes when amortized over many tasks.

Quick Start

EmbeddingCreatorStage

1from nemo_curator.backends.xenna import XennaExecutor
2from nemo_curator.stages.text.embedders import EmbeddingCreatorStage
3from nemo_curator.pipeline import Pipeline
4from nemo_curator.stages.text.io.reader import ParquetReader
5from nemo_curator.stages.text.io.writer import ParquetWriter
6
7pipeline = Pipeline(
8 name="text_embeddings",
9 stages=[
10 ParquetReader(file_paths="input_data/", files_per_partition=1, fields=["text"]),
11 EmbeddingCreatorStage(
12 model_identifier="sentence-transformers/all-MiniLM-L6-v2",
13 text_field="text",
14 embedding_field="embeddings",
15 model_inference_batch_size=256,
16 ),
17 ParquetWriter(path="output/", fields=["text", "embeddings"]),
18 ],
19)
20
21executor = XennaExecutor()
22pipeline.run(executor)

VLLMEmbeddingModelStage (Recommended for Semantic Deduplication)

VLLMEmbeddingModelStage is the default embedding backend for semantic deduplication, using google/embeddinggemma-300m. It provides better GPU utilization and throughput for large embedding models. See the vLLM Embedder guide for setup, configuration, and code examples.


Available Embedding Tools

vLLM Embedder

Generate embeddings using vLLM for high-throughput GPU-accelerated inference with large embedding models.


Integration with Semantic Deduplication

Text embeddings are a key input for semantic deduplication. The TextSemanticDeduplicationWorkflow uses VLLMEmbeddingModelStage internally, but you can also generate embeddings separately and feed them into the deduplication workflow for more control over the embedding process.