Documentation Development#

Set Up the Documentation Environment#

Before building or serving the documentation, set up the docs environment using the Makefile:

make docs-env
source .venv-docs/bin/activate

This will create a virtual environment in .venv-docs and install all required dependencies for building the documentation.

Build the Documentation#

To build the NeMo Curator documentation, run:

make docs-html
  • The resulting HTML files are generated in a _build/html folder under the project docs/ folder.

  • The generated Python API docs are placed in apidocs under the docs/ folder.

Build Variants#

The documentation supports different build variants:

  • make docs-html - Default build (includes all content)

  • make docs-html-ga - GA (General Availability) build (excludes EA-only content)

  • make docs-html-ea - EA (Early Access) build (includes all content)

Live Building#

To serve the documentation with live updates as you edit, run:

make docs-live

Open a web browser and go to http://localhost:8000 (or the port shown in your terminal) to view the output.

Conditional Content for Different Build Types#

The documentation system supports three ways to conditionally include/exclude content based on build tags (e.g., GA vs EA builds). All methods use the unified :only: syntax.

2. Grid Card Conditional Rendering#

Hide specific grid cards from certain builds:

:::{grid-item-card} Video Curation Features
:link: video-overview  
:link-type: ref
:only: not ga
Content for EA-only features
+++
{bdg-secondary}`early-access`
:::

3. Toctree Conditional Rendering#

Control navigation entries conditionally:

# Global toctree condition (hides entire section)
::::{toctree}
:hidden:
:caption: Early Access Features
:only: not ga

ea-feature1.md
ea-feature2.md  
::::

# Inline entry conditions (hides individual entries)
::::{toctree}
:hidden:
:caption: Documentation

standard-doc.md
ea-only-doc.md :only: not ga
another-standard-doc.md
::::

Best Practices#

  • Use file-level exclusion for entire documentation sections (better performance, no warnings)

  • Use grid/toctree conditions for fine-grained control within shared documents

  • Be consistent with condition syntax across all methods

  • Test both build variants to ensure content appears/disappears correctly

Testing Conditional Content#

# Test default build (includes all content)
make docs-html

# Test GA build (excludes EA-only content)  
make docs-html-ga

# Verify content is properly hidden/shown in each build

Run Doctests (if present)#

Sphinx is configured to support running doctests in both Python docstrings and in Markdown code blocks with the {doctest} directive. However, as of now, there are no real doctest examples in the codebase—only the example below in this README. If you add doctest examples, you can run them manually with:

source .venv-docs/bin/activate
cd docs
sphinx-build -b doctest . _build/doctest

There is currently no Makefile target for running doctests; you must use the above command directly.

Example: How to Write Doctests in Documentation#

Any code in triple backtick blocks with the {doctest} directive will be tested if you add real examples. The format follows Python’s doctest module syntax, where >>> indicates Python input and the following line shows the expected output. Here’s an example:

def add(x: int, y: int) -> int:
    """
    Adds two integers together.

    Args:
        x (int): The first integer to add.
        y (int): The second integer to add.

    Returns:
        int: The sum of x and y.

    Examples:
    ```{doctest}
    >>> add(1, 2)
    3
    ```

    """
    return x + y

MyST Substitutions in Code Blocks#

The documentation uses a custom Sphinx extension (myst_codeblock_substitutions) that enables MyST substitutions to work inside standard code blocks. This allows you to maintain consistent variables (like version numbers, URLs, product names) across your documentation while preserving template syntax in YAML and other template languages.

Configuration#

The extension is configured in docs/conf.py:

# Add the extension
extensions = [
    # ... other extensions
    "myst_codeblock_substitutions",  # Our custom MyST substitutions in code blocks
]

# Define reusable variables
myst_substitutions = {
    "product_name": "NeMo Curator",
    "product_name_short": "Curator", 
    "company": "NVIDIA",
    "version": release,  # Uses the release variable from conf.py
    "current_year": "2025",
    "github_repo": "https://github.com/NVIDIA/NeMo-Curator",
    "docs_url": "https://docs.nvidia.com/nemo-curator",
    "support_email": "nemo-curator-support@nvidia.com",
    "min_python_version": "3.8",
    "recommended_cuda": "12.0+",
}

Usage#

Basic MyST Substitutions in Text#

Use {{ variable }} syntax in regular markdown text:

Welcome to NeMo Curator version 0.25.7!

Curator is developed by NVIDIA.
For support, contact nemo-curator-support@nvidia.com.

MyST Substitutions in Code Blocks#

The extension enables substitutions in standard code blocks:

# Install NeMo Curator version 0.25.7
helm install my-release oci://nvcr.io/nvidia/nemo-curator --version 0.25.7
kubectl get pods -n Curator-namespace
docker run --rm nvcr.io/nvidia/nemo-curator:0.25.7
pip install nemo-curator==0.25.7

Template Language Protection#

The extension intelligently protects template languages from unwanted substitutions:

Protected Languages#

These languages are treated carefully to preserve their native {{ }} syntax:

  • yaml, yml (Kubernetes, Docker Compose)

  • helm, gotmpl, go-template (Helm charts)

  • jinja, jinja2, j2 (Ansible, Python templates)

  • ansible (Ansible playbooks)

  • handlebars, hbs, mustache (JavaScript templates)

  • django, twig, liquid, smarty (Web framework templates)

Pattern Protection#

The extension automatically detects and preserves common template patterns:

  • {{ .Values.something }} (Helm values)

  • {{ ansible_variable }} (Ansible variables)

  • {{ item.property }} (Template loops)

  • {{- variable }} (Whitespace control)

  • {{ range ... }}, {{ if ... }} (Control structures)

Mixed Usage Examples#

YAML with Mixed Syntax#

# values.yaml - MyST substitutions work alongside Helm templates
image:
  repository: nvcr.io/nvidia/nemo-curator
  tag: {{ .Values.image.tag | default "latest" }}        # ← Helm template (preserved)

# Documentation URLs using MyST substitutions  
downloads:
  releaseUrl: "https://github.com/NVIDIA/NeMo-Curator/releases/download/v0.25.7/nemo-curator.tar.gz"  # ← MyST substitution
  docsUrl: "https://docs.nvidia.com/nemo-curator"                              # ← MyST substitution
  supportEmail: "nemo-curator-support@nvidia.com"                    # ← MyST substitution

service:
  name: {{ include "nemo-curator.fullname" . }}          # ← Helm template (preserved)
  
env:
  - name: CURATOR_VERSION
    value: "{{ .Chart.AppVersion }}"                     # ← Helm template (preserved)
  - name: DOCS_VERSION  
    value: "0.25.7"                               # ← MyST substitution

Ansible with Mixed Syntax#

# MyST substitutions for documentation
- name: "Install NeMo Curator version 0.25.7"     # ← MyST substitution
  shell: |
    wget https://github.com/NVIDIA/NeMo-Curator/releases/download/v0.25.7/nemo-curator.tar.gz  # ← MyST substitution
    
  # Ansible templates preserved
  when: "{{ ansible_distribution }} == 'Ubuntu'"              # ← Ansible template (preserved)
  notify: "{{ handlers.restart_service }}"                    # ← Ansible template (preserved)

Benefits#

  1. Single Source of Truth: Update version numbers, URLs, and product names in one place (conf.py)

  2. Template Safety: Won’t break existing Helm charts, Ansible playbooks, or other templates

  3. Intelligent Processing: Only processes simple variable names, preserves complex template syntax

  4. Cross-Format Support: Works in bash, python, dockerfile, and other code blocks

  5. Maintainability: Reduces copy-paste errors and keeps documentation in sync with releases

The extension automatically handles the complexity of mixed template syntax, so you can focus on writing great documentation without worrying about breaking existing templates.