Knowledge Layer#

A pluggable abstraction for document ingestion and retrieval. Swap backends without changing application code.

Looking to build a custom backend adapter? Refer to the SDK Reference for data schemas, interfaces, and implementation examples.

Key Features#

Rich Output Schema - Chunk model with 12 fields: content types, citations, images, structured data
Full Ingestion Pipeline - BaseIngestor with async job tracking and status polling
Collection Management - create/delete/list collections per session or use case
File Management - upload/delete/list files with status tracking (UPLOADING -> INGESTING -> SUCCESS/FAILED)
Content Typing - TEXT, TABLE, CHART, IMAGE enums for frontend rendering
Backend Agnostic - Swap between local (LlamaIndex) and hosted (RAG Blueprint) without core agent code changes

Available Backends#

Backend	Config Name	Mode	Vector Store	Best For
`llamaindex`	`"llamaindex"`	Local Library	ChromaDB	Dev, prototyping, macOS/Linux
`foundational_rag`	`"foundational_rag"`	Hosted Service	Remote Milvus	Production, multi-user

Local Library Mode - Everything runs in your Python process. No external services needed.

llamaindex - LlamaIndex + ChromaDB. Lightweight, great for development. Works on macOS and Linux.

Hosted Service Mode - Connects to deployed services through HTTP. Requires infrastructure but scales better.

foundational_rag - Connects to NVIDIA RAG Blueprint through HTTP.
- Tested with: NVIDIA RAG Blueprint v2.4.0 (Helm chart nvidia-blueprint-rag)
- Deployment Guide
- Backend-specific documentation: sources/knowledge_layer/src/foundational_rag/README.md

Quick Start#

Before you begin documentation ingestion and retrieval, run the following commands to install the backend knowledge layer.

Prerequisites: Complete the main setup first (refer to the project README.md): clone repo, run ./scripts/setup.sh, obtain API keys.

Tip: Instead of exporting env vars each time, add them to deploy/.env and use dotenv -f deploy/.env run <command> to run any command with those vars loaded automatically.

# 1. Set up environment variables (add to deploy/.env to avoid exporting each time)
export NVIDIA_API_KEY=nvapi-your-key-here

# 2. Install backend (choose one)
uv pip install -e "sources/knowledge_layer[llamaindex]"        # Recommended for local dev - works on macOS/Linux
uv pip install -e "sources/knowledge_layer[foundational_rag]"  # Requires deployed server

New to Knowledge Layer? Start with llamaindex - it requires no external services and works on macOS and Linux.

# 3. Verify
python -c "from aiq_agent.knowledge import get_retriever; print('OK')"

Usage#

To use the knowledge layer, you can change the variables in the YAML config file.

With NeMo Agent Toolkit (YAML Config) - Recommended#

The knowledge_retrieval function is registered as a NeMo Agent Toolkit function type. YAML config is the recommended single source of truth for workflow configuration:

# Example knowledge_retrieval function configuration
functions:
  knowledge_search:
    _type: knowledge_retrieval      # NeMo Agent Toolkit function type
    backend: llamaindex             # Required: which adapter to use
    collection_name: my_docs        # Required: target collection
    top_k: 5                        # Results to return

    # Summarization options (optional, all backends):
    # generate_summary: true                  # Generate one-sentence summary per document
    # summary_model: nemotron_nano_llm             # LLM reference from llms: section (required if generate_summary is true)
    # summary_db: sqlite+aiosqlite:///./summaries.db  # Summary storage (SQLite or PostgreSQL)

    # Backend-specific options (each backend uses different fields):
    chroma_dir: /tmp/chroma_data              # llamaindex only
    rag_url: http://localhost:8081/v1         # foundational_rag only
    ingest_url: http://localhost:8082/v1      # foundational_rag only
    timeout: 120                              # foundational_rag only
    # verify_ssl: true                        # foundational_rag only (set false for self-signed certs)

You can also use environment variable substitution in YAML for sensitive values:

functions:
  knowledge_search:
    _type: knowledge_retrieval
    backend: foundational_rag
    rag_url: ${RAG_SERVER_URL:-http://localhost:8081/v1}
    collection_name: ${COLLECTION_NAME:-default}

Note: Each backend has different config options. Only the options matching your backend value are used - others are ignored (a warning will be logged). To add new config fields, edit KnowledgeRetrievalConfig in sources/knowledge_layer/src/register.py.

Switching Backends#

To switch backends, change the backend field and its corresponding options. Here are complete examples for each backend:

LlamaIndex (ChromaDB) - macOS/Linux

functions:
  knowledge_search:
    _type: knowledge_retrieval
    backend: llamaindex
    collection_name: my_docs
    top_k: 5
    chroma_dir: /tmp/chroma_data    # ChromaDB persistence directory

Foundational RAG (Hosted Server)

functions:
  knowledge_search:
    _type: knowledge_retrieval
    backend: foundational_rag
    collection_name: my_docs
    top_k: 5
    rag_url: http://your-server:8081/v1      # Rag server
    ingest_url: http://your-server:8082/v1   # Ingestion server
    timeout: 120

Multimodal Extraction (LlamaIndex Only)#

By default, LlamaIndex ingests text only and uses the NVIDIA hosted embedding models. When AIQ_EXTRACT_IMAGES or AIQ_EXTRACT_CHARTS is enabled, a Vision Language Model (VLM) is used during ingestion to caption embedded images and extract structured data from charts (axis labels, data points, chart type). This makes visual content in PDFs searchable and retrievable alongside text. The VLM is only invoked at ingestion time, not at query time.

All options below can be overridden via environment variables:

Variable	Default	Description
Embedding
`AIQ_EMBED_MODEL`	`nvidia/llama-nemotron-embed-vl-1b-v2`	NVIDIA embedding model
`AIQ_EMBED_BASE_URL`	`https://integrate.api.nvidia.com/v1`	Embedding API base URL — override for local NIM
Extraction Flags
`AIQ_EXTRACT_TABLES`	`false`	Extract tables from PDFs as markdown
`AIQ_EXTRACT_IMAGES`	`false`	Extract and caption images with VLM
`AIQ_EXTRACT_CHARTS`	`false`	Classify images as charts and extract structured data
Vision Model
`AIQ_VLM_MODEL`	`nvidia/nemotron-nano-12b-v2-vl`	VLM for image captioning
`AIQ_VLM_BASE_URL`	`https://integrate.api.nvidia.com/v1`	VLM API base URL — override for local NIM

When enabled, the startup log shows the active mode:

LlamaIndexIngestor initialized: persist_dir=/app/data/chroma_data, mode=text + tables + images

Note: AIQ_EXTRACT_IMAGES and AIQ_EXTRACT_CHARTS work together. If both are enabled, each image is classified by the VLM as either a chart or a regular image. Foundational RAG handles multimodal extraction server-side, so these flags only apply to the LlamaIndex backend.

Document Summaries#

Document summaries help research agents understand what files are available before making tool calls. When enabled, the knowledge layer generates a one-sentence summary during ingestion and injects it into agent system prompts.

llms:
  summary_llm:
    _type: nim
    model_name: nvidia/nemotron-mini-4b-instruct
    base_url: "https://integrate.api.nvidia.com/v1"
    temperature: 0.3
    max_tokens: 150

functions:
  knowledge_search:
    _type: knowledge_retrieval
    generate_summary: true
    summary_model: summary_llm     # Required: LLM reference from llms: section
    summary_db: ${AIQ_SUMMARY_DB:-sqlite+aiosqlite:///./summaries.db}

When generate_summary: true, you must configure summary_model to reference an LLM from the llms: section. For production deployments, use PostgreSQL for summary_db instead of SQLite.

For details on how summaries are stored, how agents consume them, and how to implement summaries in custom backends, refer to the SDK Reference - Document Summaries.

Supported File Types#

File type support depends on the configured backend:

Backend	Supported Types
LlamaIndex	PDF, DOCX, TXT, MD, HTML, JSON, CSV
Foundational RAG	PDF, DOCX, PPTX, TXT, MD, HTML, images (PNG, JPG)

For custom backends, supported types are determined by the backend implementation.

Note: The backends support more types than the frontend currently allows. The frontend only supports uploading .pdf,.docx,.txt,.md (the common subset across both backends). Types like HTML, JSON, CSV, and images are supported by the backends but the frontend upload flow does not handle them yet – this is a separate task.

To change the accepted types in the frontend, set FILE_UPLOAD_ACCEPTED_TYPES for your deployment method:

Deployment	Where to set
CLI (`start_e2e.sh`)	`deploy/.env`: `FILE_UPLOAD_ACCEPTED_TYPES=.pdf,.docx,.pptx,.txt,.md`
Docker Compose	`deploy/.env` (passed to frontend container automatically)
Helm	`deploy/helm/deployment-k8s/values.yaml` under the frontend app’s `env` section

For Foundational RAG, add .pptx to include PowerPoint support: FILE_UPLOAD_ACCEPTED_TYPES=.pdf,.docx,.pptx,.txt,.md

Programmatic Usage#

# Import the adapter module to trigger registration
from knowledge_layer.llamaindex import LlamaIndexRetriever, LlamaIndexIngestor

# Use the factory to get instances
from aiq_agent.knowledge import get_retriever, get_ingestor

# Ingest documents
ingestor = get_ingestor("llamaindex", config={"persist_dir": "/tmp/chroma"})
ingestor.create_collection("my_docs")
file_info = ingestor.upload_file("doc.pdf", "my_docs")

# Check ingestion status
status = ingestor.get_file_status(file_info.file_id, "my_docs")
print(f"Status: {status.status}")  # UPLOADING, INGESTING, SUCCESS, FAILED

# Retrieve
retriever = get_retriever("llamaindex", config={"persist_dir": "/tmp/chroma"})
result = await retriever.retrieve("query", "my_docs", top_k=5)
for chunk in result.chunks:
    print(f"{chunk.display_citation}: {chunk.content[:100]}")

Web UI Mode#

Run the backend API server and frontend UI together for document upload, collection management, and chat.

Start Backend#

# Foundational RAG example (requires deployed FRAG server)
# dotenv loads API keys (NVIDIA_API_KEY, etc.) from deploy/.env
# Additional env vars needed: RAG_SERVER_URL, RAG_INGEST_URL
dotenv -f deploy/.env run nat serve --config_file configs/config_web_frag.yml --host 0.0.0.0 --port 8000

Start Frontend#

cd frontends/ui
npm run dev

Open http://localhost:3000 in your browser.

API Endpoints#

Method	Endpoint	Description
`POST`	`/v1/collections`	Create collection
`GET`	`/v1/collections`	List collections
`GET`	`/v1/collections/{name}`	Get collection details
`DELETE`	`/v1/collections/{name}`	Delete collection
`POST`	`/v1/collections/{name}/documents`	Upload files
`GET`	`/v1/collections/{name}/documents`	List documents in collection
`DELETE`	`/v1/collections/{name}/documents`	Delete files
`GET`	`/v1/documents/{job_id}/status`	Poll ingestion status
`GET`	`/v1/knowledge/health`	Check knowledge backend health

Session Collections#

Both LlamaIndex and Foundational RAG support session-based collections (s_<uuid>) created by the UI. Each browser session gets its own isolated collection.

TTL Cleanup#

Collections inactive for 24 hours are auto-deleted based on updated_at timestamp. Background thread runs hourly.

COLLECTION_TTL_HOURS = 24
TTL_CLEANUP_INTERVAL_SECONDS = 3600

Architecture#

Core Library (`src/aiq_agent/knowledge/`)#

src/aiq_agent/knowledge/
    __init__.py        # Exports: Chunk, get_retriever, get_ingestor, etc.
    base.py            # Abstract classes: BaseRetriever, BaseIngestor
    schema.py          # Data models: Chunk, RetrievalResult, FileInfo, CollectionInfo
    factory.py         # Registry + factory: register_retriever(), get_retriever()
    summary_store.py   # SQLAlchemy-backed document summary persistence

File	Purpose
`base.py`	Defines the interface all backends must implement
`schema.py`	Universal data models - backends convert native formats to these
`factory.py`	Registration decorators + factory functions for instantiation
`summary_store.py`	Persistent storage for document summaries (SQLite/PostgreSQL)

Backend Adapters (`sources/knowledge_layer/src/`)#

sources/knowledge_layer/src/
    <backend_name>/
        __init__.py      # Imports adapter to trigger registration
        adapter.py       # @register_retriever/@register_ingestor decorated classes
        README.md        # Backend-specific documentation
        pyproject.toml   # Optional: isolated dependencies

How Registration Works#

Backends register themselves using decorators when their module is imported:

# In adapter.py
from aiq_agent.knowledge.factory import register_retriever, register_ingestor

@register_retriever("my_backend")  # Registration name used in config
class MyRetriever(BaseRetriever):
    ...

@register_ingestor("my_backend")
class MyIngestor(BaseIngestor):
    ...

The registration name (for example, "my_backend") is what you use in:

YAML config: backend: my_backend
Factory calls: get_retriever("my_backend")

Important: The adapter module must be imported for registration to happen. This is why:

__init__.py imports the adapter classes
The NeMo Agent Toolkit function imports from knowledge_layer.<backend>.adapter

NeMo Agent Toolkit Integration#

sources/knowledge_layer/src/
    register.py      # @register_function exposes retrieval to agents

The register.py defines KnowledgeRetrievalConfig which maps YAML config to backend instantiation.

Configuration#

Configuration Precedence#

Configuration values are resolved in the following order (highest to lowest priority):

Explicit parameter - Values passed directly to factory functions (get_retriever("llamaindex"))
YAML config file - The backend: field and other options in your workflow config (recommended)
Environment variables - KNOWLEDGE_RETRIEVER_BACKEND, RAG_SERVER_URL, etc.
Hardcoded defaults - Built-in fallback values

Recommendation: Use YAML config as your single source of truth for workflow configuration. Environment variables are useful for:

Container deployments (12-factor app pattern)
CI/CD overrides
Secrets management (API keys)

Environment Variables#

Variable	Backend	Description
`NVIDIA_API_KEY`	All	Required for embeddings/VLM
`KNOWLEDGE_RETRIEVER_BACKEND`	All	Default retriever backend (fallback if not in YAML)
`KNOWLEDGE_INGESTOR_BACKEND`	All	Default ingestor backend (fallback if not in YAML)
`AIQ_CHROMA_DIR`	llamaindex	ChromaDB persistence path
`RAG_SERVER_URL`	foundational_rag	Query server URL (port 8081)
`RAG_INGEST_URL`	foundational_rag	Ingestion server URL (port 8082)
`COLLECTION_NAME`	All	Default collection name

Troubleshooting#

Issue	Cause	Fix
`Unknown backend: my_backend`	Adapter not imported/registered	Import the adapter module before calling factory
`ormsgpack` attribute error	Version conflict with LangGraph	`uv pip install "ormsgpack>=1.5.0"`
Empty retrieval results	Collection empty	Run ingestion first, verify collection name matches
Job status 404	Different process/instance	Factory uses singletons - ensure same process
`milvus-lite` required	Missing dependency	`uv pip install "pymilvus[milvus_lite]"`
Backend registered twice	Module imported multiple times	Normal - factory logs warning but works fine

Debug Registration#

# Check what's registered
from aiq_agent.knowledge.factory import list_retrievers, list_ingestors, get_knowledge_layer_config

print("Retrievers:", list_retrievers())
print("Ingestors:", list_ingestors())
print("Full config:", get_knowledge_layer_config())

Document	Description
SDK Reference	Build custom backend adapters - data schemas, interfaces, full implementation example
Foundational RAG Setup (`sources/knowledge_layer/src/foundational_rag/README.md`)	Production deployment with NVIDIA RAG Blueprint