🎨 Data Designer Tutorial: Providing Images as Context for Vision-Based Data Generation

📚 What you'll learn

This notebook demonstrates how to provide images as context to generate text descriptions using vision-language models. The same multi_modal_context field can also carry audio or video context when the selected model supports those modalities.

✨ Visual Document Processing: Converting images to chat-ready format for model consumption
🔍 Vision-Language Generation: Using vision models to generate detailed summaries from images
🧩 Media Context Pattern: Understanding how ImageContext, AudioContext, and VideoContext fit into the same configuration field

If this is your first time using Data Designer, we recommend starting with the first notebook in this tutorial series.

📦 Import Data Designer

data_designer.config provides access to the configuration API.
DataDesigner is the main interface for data generation.

Python

1 # Standard library imports
2 import base64
3 import io
4 import uuid
5  
6 # Third-party imports
7 import pandas as pd
8 import rich
9 from datasets import load_dataset
10 from IPython.display import display
11 from rich.panel import Panel
12  
13 # Data Designer imports
14 import data_designer.config as dd
15 from data_designer.interface import DataDesigner
16

⚙️ Initialize the Data Designer interface

DataDesigner is the main object responsible for managing the data generation process.
When initialized without arguments, the default model providers are used.

Python

1 data_designer = DataDesigner()
2

🏗️ Initialize the Data Designer Config Builder

The Data Designer config defines the dataset schema and generation process.
The config builder provides an intuitive interface for building this configuration.
When initialized without arguments, the default model configurations are used.

Python

1 config_builder = dd.DataDesignerConfigBuilder()
2

🌱 Seed Dataset Creation

In this section, we'll prepare our visual documents as a seed dataset for summarization:

Loading Visual Documents: We use a small pets image dataset containing labeled images
Image Processing: Convert images to base64 format for vision model consumption
Metadata Extraction: Preserve relevant image information (label, etc.)

The seed dataset will be used to generate detailed text descriptions of each image.

Python

1 # Dataset processing configuration
2 IMG_COUNT = 512  # Number of images to process
3 BASE64_IMAGE_HEIGHT = 512  # Standardized height for model input
4  
5 # Load the pets dataset (train split, ~23 MB total)
6 img_dataset_cfg = {"path": "rokmr/pets", "split": "train"}
7

Python

1 def resize_image(image, height: int):
2     """
3     Resize image while maintaining aspect ratio.
4  
5     Args:
6         image: PIL Image object
7         height: Target height in pixels
8  
9     Returns:
10         Resized PIL Image object
11     """
12     original_width, original_height = image.size
13     width = int(original_width * (height / original_height))
14     return image.resize((width, height))
15  
16  
17 def convert_image_to_chat_format(record, height: int) -> dict:
18     """
19     Convert PIL image to base64 format for chat template usage.
20  
21     Args:
22         record: Dataset record containing image and metadata
23         height: Target height for image resizing
24  
25     Returns:
26         Updated record with base64_image and uuid fields
27     """
28     image = resize_image(record["image"], height)
29  
30     img_buffer = io.BytesIO()
31     image.save(img_buffer, format="PNG")
32     byte_data = img_buffer.getvalue()
33     base64_encoded_data = base64.b64encode(byte_data)
34     base64_string = base64_encoded_data.decode("utf-8")
35  
36     return record | {"base64_image": base64_string, "uuid": str(uuid.uuid4())}
37

Python

1 # Load and process the image dataset
2 print("📥 Loading and processing images...")
3  
4 img_dataset = load_dataset(**img_dataset_cfg).map(
5     convert_image_to_chat_format, fn_kwargs={"height": BASE64_IMAGE_HEIGHT}
6 )
7 img_dataset = pd.DataFrame(img_dataset[:IMG_COUNT])
8  
9 print(f"✅ Loaded {len(img_dataset)} images with columns: {list(img_dataset.columns)}")
10

Output

Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.[13:24:13] [WARNING] Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.📥 Loading and processing images...README.md: 0.00B [00:00, ?B/s]dataset_infos.json: 0.00B [00:00, ?B/s]data/train.zip:   0%|          | 0.00/20.4M [00:00<?, ?B/s]data/test.zip:   0%|          | 0.00/3.29M [00:00<?, ?B/s]Generating train split:   0%|          | 0/900 [00:00<?, ? examples/s]Generating test split:   0%|          | 0/150 [00:00<?, ? examples/s]Map:   0%|          | 0/900 [00:00<?, ? examples/s]✅ Loaded 512 images with columns: ['image', 'label', 'base64_image', 'uuid']

Python

1 img_dataset.head()
2

Output

  
      
      image
      label
      base64_image
      uuid
    

  
      0
      <PIL.JpegImagePlugin.JpegImageFile image mode=...
      0
      iVBORw0KGgoAAAANSUhEUgAAAeQAAAIACAIAAADc8YinAA...
      87f84627-9911-4344-9e18-07d39c8f36d1
    

      1
      <PIL.JpegImagePlugin.JpegImageFile image mode=...
      0
      iVBORw0KGgoAAAANSUhEUgAAAiQAAAIACAIAAAA9rOAHAA...
      c8ae8bad-9f5b-40fc-b292-3662b5a9d742
    

      2
      <PIL.JpegImagePlugin.JpegImageFile image mode=...
      0
      iVBORw0KGgoAAAANSUhEUgAAAqoAAAIACAIAAADFYNm1AA...
      85eb94d7-a00d-436a-a75c-f3a867b6c64c
    

      3
      <PIL.JpegImagePlugin.JpegImageFile image mode=...
      0
      iVBORw0KGgoAAAANSUhEUgAAAwAAAAIACAIAAAC6lJxtAA...
      a4095d29-0b51-4c1c-a3be-003f66f3dc1b
    

      4
      <PIL.PngImagePlugin.PngImageFile image mode=RG...
      0
      iVBORw0KGgoAAAANSUhEUgAAAqoAAAIACAIAAADFYNm1AA...
      7db77bef-1bca-4d57-babd-6426ff5632af
    

	image	base64_image	uuid
0	<PIL.JpegImagePlugin.JpegImageFile image mode=...	iVBORw0KGgoAAAANSUhEUgAAAeQAAAIACAIAAADc8YinAA...	87f84627-9911-4344-9e18-07d39c8f36d1
1	<PIL.JpegImagePlugin.JpegImageFile image mode=...	iVBORw0KGgoAAAANSUhEUgAAAiQAAAIACAIAAAA9rOAHAA...	c8ae8bad-9f5b-40fc-b292-3662b5a9d742
2	<PIL.JpegImagePlugin.JpegImageFile image mode=...	iVBORw0KGgoAAAANSUhEUgAAAqoAAAIACAIAAADFYNm1AA...	85eb94d7-a00d-436a-a75c-f3a867b6c64c
3	<PIL.JpegImagePlugin.JpegImageFile image mode=...	iVBORw0KGgoAAAANSUhEUgAAAwAAAAIACAIAAAC6lJxtAA...	a4095d29-0b51-4c1c-a3be-003f66f3dc1b
4	<PIL.PngImagePlugin.PngImageFile image mode=RG...	iVBORw0KGgoAAAANSUhEUgAAAqoAAAIACAIAAADFYNm1AA...	7db77bef-1bca-4d57-babd-6426ff5632af

Python

1 # Add the seed dataset containing our processed images
2 df_seed = pd.DataFrame(img_dataset)[["uuid", "label", "base64_image"]]
3 config_builder.with_seed_dataset(dd.DataFrameSeedSource(df=df_seed))
4

Output

DataDesignerConfigBuilder(
    seed_dataset: df seed
)

🧩 Media context and model capabilities

multi_modal_context accepts media context descriptors such as ImageContext, AudioContext, and VideoContext. Data Designer reads the referenced seed columns and serializes them for the model request, but the selected model still determines which modalities are valid.

This notebook uses image context only because image-capable VLMs are broadly available. Before combining image, audio, and video in one column, choose a model alias backed by an omni or otherwise modality-compatible model, and check that the provider accepts every context type you send.

For base64 seed columns, store the raw base64 payload without a data:<media-type>;base64, prefix and specify the media format on the context object:

media_context = [
    dd.ImageContext(
        column_name="image_base64",
        data_type=dd.ModalityDataType.BASE64,
        image_format=dd.ImageFormat.PNG,
    ),
    dd.AudioContext(
        column_name="audio_base64",
        data_type=dd.ModalityDataType.BASE64,
        audio_format=dd.AudioFormat.MP3,
    ),
    dd.VideoContext(
        column_name="video_base64",
        data_type=dd.ModalityDataType.BASE64,
        video_format=dd.VideoFormat.MP4,
    ),
]

URL-backed media can use data_type=dd.ModalityDataType.URL, subject to the provider's URL support and file-size limits. Local audio/video paths require explicit URL mode and require the model endpoint to have filesystem access to the same paths, typically a colocated vLLM server configured for local media access.

Python

1 # Add a column to generate detailed image descriptions
2 config_builder.add_column(
3     dd.LLMTextColumnConfig(
4         name="description",
5         model_alias="nvidia-vision",
6         prompt=(
7             "Provide a detailed description of the content in this image in Markdown format. "
8             "Describe the main subject, background, colors, and any notable details."
9         ),
10         multi_modal_context=[dd.ImageContext(column_name="base64_image")],
11     )
12 )
13  
14 data_designer.validate(config_builder)
15

Output

[13:25:53] [INFO] ✅ Validation passed

🔁 Iteration is key – preview the dataset!

Use the preview method to generate a sample of records quickly.
Inspect the results for quality and format issues.
Adjust column configurations, prompts, or parameters as needed.
Re-run the preview until satisfied.

Python

1 preview = data_designer.preview(config_builder, num_records=2)
2

Output

[13:25:53] [INFO] 👁️ Preview generation in progress[13:25:53] [INFO]   |-- 🔒 Jinja rendering engine: secure[13:25:53] [INFO] ✅ Validation passed[13:25:53] [INFO] ⛓️ Sorting column configs into a Directed Acyclic Graph[13:25:53] [INFO] 🩺 Running health checks for models...[13:25:53] [INFO]   |-- 👀 Checking 'nvidia/nemotron-3-nano-omni-30b-a3b-reasoning' in provider named 'nvidia' for model alias 'nvidia-vision'...[13:25:55] [INFO]   |-- ✅ Passed![13:25:55] [INFO] ⚡ DATA_DESIGNER_ASYNC_ENGINE is enabled - using async task-queue preview[13:25:55] [INFO] 📝 llm-text model config for column 'description'[13:25:55] [INFO]   |-- model: 'nvidia/nemotron-3-nano-omni-30b-a3b-reasoning'[13:25:55] [INFO]   |-- model alias: 'nvidia-vision'[13:25:55] [INFO]   |-- model provider: 'nvidia'[13:25:55] [INFO]   |-- inference parameters:[13:25:55] [INFO]   |  |-- generation_type=chat-completion[13:25:55] [INFO]   |  |-- max_parallel_requests=4[13:25:55] [INFO]   |  |-- temperature=0.60[13:25:55] [INFO]   |  |-- top_p=0.95[13:25:55] [INFO] ⚡️ Async generation: 1 column(s) (description), 2 tasks across 1 row group(s)[13:25:55] [INFO] 🚀 (1/1) Dispatching with 2 records[13:25:55] [INFO] 🌱 (1/1) Sampling 2 records from seed dataset[13:25:55] [INFO]   |-- seed dataset size: 512 records[13:25:55] [INFO]   |-- sampling strategy: ordered[13:26:00] [INFO] 📊 Progress [4.8s]:[13:26:00] [INFO]   |-- 🤩 description: 2/2 (100%) 0.4 rec/s[13:26:00] [INFO] ✅ Async generation complete [4.8s]: 2 ok, 0 failed across 1 column(s)[13:26:00] [INFO] 📊 Model usage summary:[13:26:00] [INFO]   |-- model: nvidia/nemotron-3-nano-omni-30b-a3b-reasoning[13:26:00] [INFO]   |-- tokens: input=658, output=1985, reasoning=1198 (estimated), total=2643, tps=545[13:26:00] [INFO]   |-- reasoning token count estimated with tiktoken[13:26:00] [INFO]   |-- requests: success=2, failed=0, total=2, rpm=24[13:26:00] [INFO] 📐 Measuring dataset column statistics:[13:26:00] [INFO]   |-- 📝 column: 'description'[13:26:00] [INFO] 🥳 Preview complete!

Python

1 # Run this cell multiple times to cycle through the 2 preview records.
2 preview.display_sample_record()
3

Output

[index: 0]
                                                                                                              
                                                 Seed Columns                                                 
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Name         ┃ Value                                                                                       ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ uuid         │ 87f84627-9911-4344-9e18-07d39c8f36d1                                                        │
├──────────────┼─────────────────────────────────────────────────────────────────────────────────────────────┤
│ label        │ 0                                                                                           │
├──────────────┼─────────────────────────────────────────────────────────────────────────────────────────────┤
│ base64_image │ iVBORw0KGgoAAAANSUhEUgAAAeQAAAIACAIAAADc8YinAAEAAElEQVR4nOy9V5ckuZEmamZwEREpSna1YAv28JLDHT… │
└──────────────┴─────────────────────────────────────────────────────────────────────────────────────────────┘
                                                                                                              
                                                                                                              
                                              Generated Columns                                               
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Name        ┃ Value                                                                                        ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ description │ # Close-up Portrait of a Black and White Cat                                                 │
│             │                                                                                              │
│             │ ## Main Subject                                                                              │
│             │ The image features a close-up shot of a domestic cat, likely a tuxedo or bicolor breed. The  │
│             │ cat is positioned centrally, filling most of the frame from the chest up. It is looking      │
│             │ slightly upward and forward with an attentive, wide-eyed expression.                         │
│             │                                                                                              │
│             │ *   **Eyes:** The cat has large, round eyes that are a striking yellow-green color with      │
│             │ vertical black pupils. The gaze is intense and focused.                                      │
│             │ *   **Fur Pattern:** The fur is distinctly two-toned. The ears, the sides of the head, and   │
│             │ patches around the eyes are black. A broad stripe of white fur runs down the center of the   │
│             │ forehead, between the eyes, and covers the nose, mouth area, and chest.                      │
│             │ *   **Face:** The nose is small, black, and triangular. The mouth is closed in a neutral,    │
│             │ slightly downturned line, giving the cat a somewhat serious or curious look. Long, thin      │
│             │ white whiskers extend outward from the muzzle on both sides.                                 │
│             │ *   **Ears:** The ears are pointed and upright, indicating alertness. The inside of the ears │
│             │ shows some lighter fur mixed with black.                                                     │
│             │                                                                                              │
│             │ ## Background                                                                                │
│             │ The background is simple and out of focus, which helps emphasize the cat as the main         │
│             │ subject.                                                                                     │
│             │ *   **Left/Top:** A plain, light-colored wall (appearing off-white or very light grey).      │
│             │ *   **Right:** A vertical section of a light brown, possibly wooden surface, likely a door   │
│             │ frame or furniture edge.                                                                     │
│             │                                                                                              │
│             │ ## Colors and Lighting                                                                       │
│             │ *   **Color Palette:** The dominant colors are black, white, and the yellow-green of the     │
│             │ eyes. The background introduces neutral tones of white/grey and tan/brown.                   │
│             │ *   **Lighting:** The lighting appears to be soft and diffuse, coming from the front. It     │
│             │ illuminates the cat's face evenly without creating harsh shadows, highlighting the texture   │
│             │ of the fur and the shine in the eyes.                                                        │
└─────────────┴──────────────────────────────────────────────────────────────────────────────────────────────┘
                                                                                                              

Python

1 # The preview dataset is available as a pandas DataFrame.
2 preview.dataset
3

Output

  
      
      uuid
      label
      base64_image
      description
    

  
      0
      87f84627-9911-4344-9e18-07d39c8f36d1
      0
      iVBORw0KGgoAAAANSUhEUgAAAeQAAAIACAIAAADc8YinAA...
      # Close-up Portrait of a Black and White Cat\n...
    

      1
      c8ae8bad-9f5b-40fc-b292-3662b5a9d742
      0
      iVBORw0KGgoAAAANSUhEUgAAAiQAAAIACAIAAAA9rOAHAA...
      # Detailed Description of the Image\n\n**Main ...
    

	uuid	label	base64_image	description
0	87f84627-9911-4344-9e18-07d39c8f36d1	0	iVBORw0KGgoAAAANSUhEUgAAAeQAAAIACAIAAADc8YinAA...	# Close-up Portrait of a Black and White Cat\n...
1	c8ae8bad-9f5b-40fc-b292-3662b5a9d742	0	iVBORw0KGgoAAAANSUhEUgAAAiQAAAIACAIAAAA9rOAHAA...	# Detailed Description of the Image\n\n**Main ...

📊 Analyze the generated data

Data Designer automatically generates a basic statistical analysis of the generated data.
This analysis is available via the analysis property of generation result objects.

Python

1 # Print the analysis as a table.
2 preview.analysis.to_report()
3

Output

──────────────────────────────────────── 🎨 Data Designer Dataset Profile ─────────────────────────────────────────

                                                                                                                   
                                                 Dataset Overview                                                  
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ number of records               ┃ number of columns               ┃ percent complete records                    ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 2                               │ 1                               │ 100.0%                                      │
└─────────────────────────────────┴─────────────────────────────────┴─────────────────────────────────────────────┘
                                                                                                                   
                                                                                                                   
                                                📝 LLM-Text Columns                                                
┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃                  ┃               ┃                              ┃       prompt tokens ┃       completion tokens ┃
┃ column name      ┃     data type ┃         number unique values ┃          per record ┃              per record ┃
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ description      │        string │                   2 (100.0%) │        29.0 +/- 0.0 │          383.0 +/- 25.5 │
└──────────────────┴───────────────┴──────────────────────────────┴─────────────────────┴─────────────────────────┘
                                                                                                                   
                                                                                                                   
╭────────────────────────────────────────────────── Table Notes ──────────────────────────────────────────────────╮
│                                                                                                                 │
│  1. All token statistics are based on a sample of max(1000, len(dataset)) records.                              │
│  2. Tokens are calculated using tiktoken's cl100k_base tokenizer.                                               │
│                                                                                                                 │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
                                                                                                                   
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────

🔎 Visual Inspection

Let's compare the original image with the generated description to validate quality:

Python

1 # Compare original image with generated description
2 index = 0  # Change this to view different examples
3  
4 # Merge preview data with original images for comparison
5 comparison_dataset = preview.dataset.merge(pd.DataFrame(img_dataset)[["uuid", "image"]], how="left", on="uuid")
6  
7 # Extract the record for display
8 record = comparison_dataset.iloc[index]
9  
10 print("📄 Original Image:")
11 display(resize_image(record.image, BASE64_IMAGE_HEIGHT))
12  
13 print("\n📝 Generated Description:")
14 rich.print(Panel(record.description, title="Image Description", title_align="left"))
15

Output

📄 Original Image:


📝 Generated Description:

╭─ Image Description ─────────────────────────────────────────────────────────────────────────────────────────────╮
│ # Close-up Portrait of a Black and White Cat                                                                    │
│                                                                                                                 │
│ ## Main Subject                                                                                                 │
│ The image features a close-up shot of a domestic cat, likely a tuxedo or bicolor breed. The cat is positioned   │
│ centrally, filling most of the frame from the chest up. It is looking slightly upward and forward with an       │
│ attentive, wide-eyed expression.                                                                                │
│                                                                                                                 │
│ *   **Eyes:** The cat has large, round eyes that are a striking yellow-green color with vertical black pupils.  │
│ The gaze is intense and focused.                                                                                │
│ *   **Fur Pattern:** The fur is distinctly two-toned. The ears, the sides of the head, and patches around the   │
│ eyes are black. A broad stripe of white fur runs down the center of the forehead, between the eyes, and covers  │
│ the nose, mouth area, and chest.                                                                                │
│ *   **Face:** The nose is small, black, and triangular. The mouth is closed in a neutral, slightly downturned   │
│ line, giving the cat a somewhat serious or curious look. Long, thin white whiskers extend outward from the      │
│ muzzle on both sides.                                                                                           │
│ *   **Ears:** The ears are pointed and upright, indicating alertness. The inside of the ears shows some lighter │
│ fur mixed with black.                                                                                           │
│                                                                                                                 │
│ ## Background                                                                                                   │
│ The background is simple and out of focus, which helps emphasize the cat as the main subject.                   │
│ *   **Left/Top:** A plain, light-colored wall (appearing off-white or very light grey).                         │
│ *   **Right:** A vertical section of a light brown, possibly wooden surface, likely a door frame or furniture   │
│ edge.                                                                                                           │
│                                                                                                                 │
│ ## Colors and Lighting                                                                                          │
│ *   **Color Palette:** The dominant colors are black, white, and the yellow-green of the eyes. The background   │
│ introduces neutral tones of white/grey and tan/brown.                                                           │
│ *   **Lighting:** The lighting appears to be soft and diffuse, coming from the front. It illuminates the cat's  │
│ face evenly without creating harsh shadows, highlighting the texture of the fur and the shine in the eyes.      │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

🆙 Scale up!

Happy with your preview data?
Use the create method to submit larger Data Designer generation jobs.

Python

1 results = data_designer.create(config_builder, num_records=10, dataset_name="tutorial-4")
2

Output

[13:26:00] [INFO] 🎨 Creating Data Designer dataset[13:26:00] [INFO]   |-- 🔒 Jinja rendering engine: secure[13:26:00] [INFO] ✅ Validation passed[13:26:00] [INFO] ⛓️ Sorting column configs into a Directed Acyclic Graph[13:26:00] [INFO] 🩺 Running health checks for models...[13:26:00] [INFO]   |-- 👀 Checking 'nvidia/nemotron-3-nano-omni-30b-a3b-reasoning' in provider named 'nvidia' for model alias 'nvidia-vision'...[13:26:00] [INFO]   |-- ✅ Passed![13:26:00] [INFO] ⚡ DATA_DESIGNER_ASYNC_ENGINE is enabled - using async task-queue builder[13:26:00] [INFO] 📝 llm-text model config for column 'description'[13:26:00] [INFO]   |-- model: 'nvidia/nemotron-3-nano-omni-30b-a3b-reasoning'[13:26:00] [INFO]   |-- model alias: 'nvidia-vision'[13:26:00] [INFO]   |-- model provider: 'nvidia'[13:26:00] [INFO]   |-- inference parameters:[13:26:00] [INFO]   |  |-- generation_type=chat-completion[13:26:00] [INFO]   |  |-- max_parallel_requests=4[13:26:00] [INFO]   |  |-- temperature=0.60[13:26:00] [INFO]   |  |-- top_p=0.95[13:26:00] [INFO] ⚡️ Async generation: 1 column(s) (description), 10 tasks across 1 row group(s)[13:26:00] [INFO] 🚀 (1/1) Dispatching with 10 records[13:26:00] [INFO] 🌱 (1/1) Sampling 10 records from seed dataset[13:26:00] [INFO]   |-- seed dataset size: 512 records[13:26:00] [INFO]   |-- sampling strategy: ordered[13:26:05] [INFO] 📊 Progress [5.1s]:[13:26:05] [INFO]   |-- 🌦️ description: 3/10 (30%) 0.6 rec/s[13:26:11] [INFO] 📊 Progress [10.5s]:[13:26:11] [INFO]   |-- ⛅ description: 6/10 (60%) 0.6 rec/s[13:26:16] [INFO] 📊 Progress [15.5s]:[13:26:16] [INFO]   |-- ☀️ description: 10/10 (100%) 0.6 rec/s[13:26:16] [INFO] ✅ Async generation complete [15.6s]: 10 ok, 0 failed across 1 column(s)[13:26:16] [INFO] 📊 Model usage summary:[13:26:16] [INFO]   |-- model: nvidia/nemotron-3-nano-omni-30b-a3b-reasoning[13:26:16] [INFO]   |-- tokens: input=3720, output=9214, reasoning=5820 (estimated), total=12934, tps=822[13:26:16] [INFO]   |-- reasoning token count estimated with tiktoken[13:26:16] [INFO]   |-- requests: success=10, failed=0, total=10, rpm=38[13:26:16] [INFO] 📐 Measuring dataset column statistics:[13:26:16] [INFO]   |-- 📝 column: 'description'

Python

1 # Load the generated dataset as a pandas DataFrame.
2 dataset = results.load_dataset()
3  
4 dataset.head()
5

Output

  
      
      uuid
      label
      base64_image
      description
    

  
      0
      87f84627-9911-4344-9e18-07d39c8f36d1
      0
      iVBORw0KGgoAAAANSUhEUgAAAeQAAAIACAIAAADc8YinAA...
      # Detailed Description

## Main Subject
The im...
    

      1
      c8ae8bad-9f5b-40fc-b292-3662b5a9d742
      0
      iVBORw0KGgoAAAANSUhEUgAAAiQAAAIACAIAAAA9rOAHAA...
      # Detailed Description

## Main Subject
The pr...
    

      2
      85eb94d7-a00d-436a-a75c-f3a867b6c64c
      0
      iVBORw0KGgoAAAANSUhEUgAAAqoAAAIACAIAAADFYNm1AA...
      # Cat on Wooden Floor

## Main Subject
The pri...
    

      3
      a4095d29-0b51-4c1c-a3be-003f66f3dc1b
      0
      iVBORw0KGgoAAAANSUhEUgAAAwAAAAIACAIAAAC6lJxtAA...
      # Cat in a Green Container

## Main Subject
Th...
    

      4
      7db77bef-1bca-4d57-babd-6426ff5632af
      0
      iVBORw0KGgoAAAANSUhEUgAAAqoAAAIACAIAAADFYNm1AA...
      Based on the image provided, here is a detaile...
    

	uuid	base64_image	description
0	87f84627-9911-4344-9e18-07d39c8f36d1	iVBORw0KGgoAAAANSUhEUgAAAeQAAAIACAIAAADc8YinAA...	# Detailed Description ## Main Subject The im...
1	c8ae8bad-9f5b-40fc-b292-3662b5a9d742	iVBORw0KGgoAAAANSUhEUgAAAiQAAAIACAIAAAA9rOAHAA...	# Detailed Description ## Main Subject The pr...
2	85eb94d7-a00d-436a-a75c-f3a867b6c64c	iVBORw0KGgoAAAANSUhEUgAAAqoAAAIACAIAAADFYNm1AA...	# Cat on Wooden Floor ## Main Subject The pri...
3	a4095d29-0b51-4c1c-a3be-003f66f3dc1b	iVBORw0KGgoAAAANSUhEUgAAAwAAAAIACAIAAAC6lJxtAA...	# Cat in a Green Container ## Main Subject Th...
4	7db77bef-1bca-4d57-babd-6426ff5632af	iVBORw0KGgoAAAANSUhEUgAAAqoAAAIACAIAAADFYNm1AA...	Based on the image provided, here is a detaile...

Python

1 # Load the analysis results into memory.
2 analysis = results.load_analysis()
3  
4 analysis.to_report()
5

Output

──────────────────────────────────────── 🎨 Data Designer Dataset Profile ─────────────────────────────────────────

                                                                                                                   
                                                 Dataset Overview                                                  
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ number of records               ┃ number of columns               ┃ percent complete records                    ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 10                              │ 1                               │ 100.0%                                      │
└─────────────────────────────────┴─────────────────────────────────┴─────────────────────────────────────────────┘
                                                                                                                   
                                                                                                                   
                                                📝 LLM-Text Columns                                                
┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃                  ┃               ┃                              ┃       prompt tokens ┃       completion tokens ┃
┃ column name      ┃     data type ┃         number unique values ┃          per record ┃              per record ┃
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ description      │        string │                  10 (100.0%) │        29.0 +/- 0.0 │          306.5 +/- 61.2 │
└──────────────────┴───────────────┴──────────────────────────────┴─────────────────────┴─────────────────────────┘
                                                                                                                   
                                                                                                                   
╭────────────────────────────────────────────────── Table Notes ──────────────────────────────────────────────────╮
│                                                                                                                 │
│  1. All token statistics are based on a sample of max(1000, len(dataset)) records.                              │
│  2. Tokens are calculated using tiktoken's cl100k_base tokenizer.                                               │
│                                                                                                                 │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
                                                                                                                   
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────

⏭️ Next Steps

Now that you've learned how to use visual context for image summarization in Data Designer, explore more:

Experiment with different vision models for specific image types
Try different prompt variations to generate specialized descriptions (e.g., technical details, key findings)
Combine image, audio, or video context with other column types after confirming your selected model supports those modalities
Apply this pattern to other vision tasks like image captioning, OCR validation, or visual question answering
Generating images with Data Designer