Providing Images as Context

View as Markdown

🎨 Data Designer Tutorial: Providing Images as Context for Vision-Based Data Generation

📚 What you'll learn

This notebook demonstrates how to provide images as context to generate text descriptions using vision-language models. The same multi_modal_context field can also carry audio or video context when the selected model supports those modalities.

  • Visual Document Processing: Converting images to chat-ready format for model consumption
  • 🔍 Vision-Language Generation: Using vision models to generate detailed summaries from images
  • 🧩 Media Context Pattern: Understanding how ImageContext, AudioContext, and VideoContext fit into the same configuration field

If this is your first time using Data Designer, we recommend starting with the first notebook in this tutorial series.

📦 Import Data Designer

  • data_designer.config provides access to the configuration API.

  • DataDesigner is the main interface for data generation.

Python
1# Standard library imports
2import base64
3import io
4import uuid
5
6# Third-party imports
7import pandas as pd
8import rich
9from datasets import load_dataset
10from IPython.display import display
11from rich.panel import Panel
12
13# Data Designer imports
14import data_designer.config as dd
15from data_designer.interface import DataDesigner
16

⚙️ Initialize the Data Designer interface

  • DataDesigner is the main object responsible for managing the data generation process.

  • When initialized without arguments, the default model providers are used.

Python
1data_designer = DataDesigner()
2

🏗️ Initialize the Data Designer Config Builder

  • The Data Designer config defines the dataset schema and generation process.

  • The config builder provides an intuitive interface for building this configuration.

  • When initialized without arguments, the default model configurations are used.

Python
1config_builder = dd.DataDesignerConfigBuilder()
2

🌱 Seed Dataset Creation

In this section, we'll prepare our visual documents as a seed dataset for summarization:

  • Loading Visual Documents: We use a small pets image dataset containing labeled images
  • Image Processing: Convert images to base64 format for vision model consumption
  • Metadata Extraction: Preserve relevant image information (label, etc.)

The seed dataset will be used to generate detailed text descriptions of each image.

Python
1# Dataset processing configuration
2IMG_COUNT = 512 # Number of images to process
3BASE64_IMAGE_HEIGHT = 512 # Standardized height for model input
4
5# Load the pets dataset (train split, ~23 MB total)
6img_dataset_cfg = {"path": "rokmr/pets", "split": "train"}
7
Python
1def resize_image(image, height: int):
2 """
3 Resize image while maintaining aspect ratio.
4
5 Args:
6 image: PIL Image object
7 height: Target height in pixels
8
9 Returns:
10 Resized PIL Image object
11 """
12 original_width, original_height = image.size
13 width = int(original_width * (height / original_height))
14 return image.resize((width, height))
15
16
17def convert_image_to_chat_format(record, height: int) -> dict:
18 """
19 Convert PIL image to base64 format for chat template usage.
20
21 Args:
22 record: Dataset record containing image and metadata
23 height: Target height for image resizing
24
25 Returns:
26 Updated record with base64_image and uuid fields
27 """
28 image = resize_image(record["image"], height)
29
30 img_buffer = io.BytesIO()
31 image.save(img_buffer, format="PNG")
32 byte_data = img_buffer.getvalue()
33 base64_encoded_data = base64.b64encode(byte_data)
34 base64_string = base64_encoded_data.decode("utf-8")
35
36 return record | {"base64_image": base64_string, "uuid": str(uuid.uuid4())}
37
Python
1# Load and process the image dataset
2print("📥 Loading and processing images...")
3
4img_dataset = load_dataset(**img_dataset_cfg).map(
5 convert_image_to_chat_format, fn_kwargs={"height": BASE64_IMAGE_HEIGHT}
6)
7img_dataset = pd.DataFrame(img_dataset[:IMG_COUNT])
8
9print(f"✅ Loaded {len(img_dataset)} images with columns: {list(img_dataset.columns)}")
10
Output
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
[13:24:13] [WARNING] Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
📥 Loading and processing images...
README.md: 0.00B [00:00, ?B/s]
dataset_infos.json: 0.00B [00:00, ?B/s]
data/train.zip:   0%|          | 0.00/20.4M [00:00<?, ?B/s]
data/test.zip:   0%|          | 0.00/3.29M [00:00<?, ?B/s]
Generating train split:   0%|          | 0/900 [00:00<?, ? examples/s]
Generating test split:   0%|          | 0/150 [00:00<?, ? examples/s]
Map:   0%|          | 0/900 [00:00<?, ? examples/s]
✅ Loaded 512 images with columns: ['image', 'label', 'base64_image', 'uuid']
Python
1img_dataset.head()
2
Output
image label base64_image uuid
0 <PIL.JpegImagePlugin.JpegImageFile image mode=... 0 iVBORw0KGgoAAAANSUhEUgAAAeQAAAIACAIAAADc8YinAA... 87f84627-9911-4344-9e18-07d39c8f36d1
1 <PIL.JpegImagePlugin.JpegImageFile image mode=... 0 iVBORw0KGgoAAAANSUhEUgAAAiQAAAIACAIAAAA9rOAHAA... c8ae8bad-9f5b-40fc-b292-3662b5a9d742
2 <PIL.JpegImagePlugin.JpegImageFile image mode=... 0 iVBORw0KGgoAAAANSUhEUgAAAqoAAAIACAIAAADFYNm1AA... 85eb94d7-a00d-436a-a75c-f3a867b6c64c
3 <PIL.JpegImagePlugin.JpegImageFile image mode=... 0 iVBORw0KGgoAAAANSUhEUgAAAwAAAAIACAIAAAC6lJxtAA... a4095d29-0b51-4c1c-a3be-003f66f3dc1b
4 <PIL.PngImagePlugin.PngImageFile image mode=RG... 0 iVBORw0KGgoAAAANSUhEUgAAAqoAAAIACAIAAADFYNm1AA... 7db77bef-1bca-4d57-babd-6426ff5632af
Python
1# Add the seed dataset containing our processed images
2df_seed = pd.DataFrame(img_dataset)[["uuid", "label", "base64_image"]]
3config_builder.with_seed_dataset(dd.DataFrameSeedSource(df=df_seed))
4
Output
DataDesignerConfigBuilder(
    seed_dataset: df seed
)

🧩 Media context and model capabilities

multi_modal_context accepts media context descriptors such as ImageContext, AudioContext, and VideoContext. Data Designer reads the referenced seed columns and serializes them for the model request, but the selected model still determines which modalities are valid.

This notebook uses image context only because image-capable VLMs are broadly available. Before combining image, audio, and video in one column, choose a model alias backed by an omni or otherwise modality-compatible model, and check that the provider accepts every context type you send.

For base64 seed columns, store the raw base64 payload without a data:<media-type>;base64, prefix and specify the media format on the context object:

media_context = [
    dd.ImageContext(
        column_name="image_base64",
        data_type=dd.ModalityDataType.BASE64,
        image_format=dd.ImageFormat.PNG,
    ),
    dd.AudioContext(
        column_name="audio_base64",
        data_type=dd.ModalityDataType.BASE64,
        audio_format=dd.AudioFormat.MP3,
    ),
    dd.VideoContext(
        column_name="video_base64",
        data_type=dd.ModalityDataType.BASE64,
        video_format=dd.VideoFormat.MP4,
    ),
]

URL-backed media can use data_type=dd.ModalityDataType.URL, subject to the provider's URL support and file-size limits. Local audio/video paths require explicit URL mode and require the model endpoint to have filesystem access to the same paths, typically a colocated vLLM server configured for local media access.

Python
1# Add a column to generate detailed image descriptions
2config_builder.add_column(
3 dd.LLMTextColumnConfig(
4 name="description",
5 model_alias="nvidia-vision",
6 prompt=(
7 "Provide a detailed description of the content in this image in Markdown format. "
8 "Describe the main subject, background, colors, and any notable details."
9 ),
10 multi_modal_context=[dd.ImageContext(column_name="base64_image")],
11 )
12)
13
14data_designer.validate(config_builder)
15
Output
[13:25:53] [INFO] ✅ Validation passed

🔁 Iteration is key – preview the dataset!

  1. Use the preview method to generate a sample of records quickly.

  2. Inspect the results for quality and format issues.

  3. Adjust column configurations, prompts, or parameters as needed.

  4. Re-run the preview until satisfied.

Python
1preview = data_designer.preview(config_builder, num_records=2)
2
Output
[13:25:53] [INFO] 👁️ Preview generation in progress
[13:25:53] [INFO]   |-- 🔒 Jinja rendering engine: secure
[13:25:53] [INFO] ✅ Validation passed
[13:25:53] [INFO] ⛓️ Sorting column configs into a Directed Acyclic Graph
[13:25:53] [INFO] 🩺 Running health checks for models...
[13:25:53] [INFO]   |-- 👀 Checking 'nvidia/nemotron-3-nano-omni-30b-a3b-reasoning' in provider named 'nvidia' for model alias 'nvidia-vision'...
[13:25:55] [INFO]   |-- ✅ Passed!
[13:25:55] [INFO] ⚡ DATA_DESIGNER_ASYNC_ENGINE is enabled - using async task-queue preview
[13:25:55] [INFO] 📝 llm-text model config for column 'description'
[13:25:55] [INFO]   |-- model: 'nvidia/nemotron-3-nano-omni-30b-a3b-reasoning'
[13:25:55] [INFO]   |-- model alias: 'nvidia-vision'
[13:25:55] [INFO]   |-- model provider: 'nvidia'
[13:25:55] [INFO]   |-- inference parameters:
[13:25:55] [INFO]   |  |-- generation_type=chat-completion
[13:25:55] [INFO]   |  |-- max_parallel_requests=4
[13:25:55] [INFO]   |  |-- temperature=0.60
[13:25:55] [INFO]   |  |-- top_p=0.95
[13:25:55] [INFO] ⚡️ Async generation: 1 column(s) (description), 2 tasks across 1 row group(s)
[13:25:55] [INFO] 🚀 (1/1) Dispatching with 2 records
[13:25:55] [INFO] 🌱 (1/1) Sampling 2 records from seed dataset
[13:25:55] [INFO]   |-- seed dataset size: 512 records
[13:25:55] [INFO]   |-- sampling strategy: ordered
[13:26:00] [INFO] 📊 Progress [4.8s]:
[13:26:00] [INFO]   |-- 🤩 description: 2/2 (100%) 0.4 rec/s
[13:26:00] [INFO] ✅ Async generation complete [4.8s]: 2 ok, 0 failed across 1 column(s)
[13:26:00] [INFO] 📊 Model usage summary:
[13:26:00] [INFO]   |-- model: nvidia/nemotron-3-nano-omni-30b-a3b-reasoning
[13:26:00] [INFO]   |-- tokens: input=658, output=1985, reasoning=1198 (estimated), total=2643, tps=545
[13:26:00] [INFO]   |-- reasoning token count estimated with tiktoken
[13:26:00] [INFO]   |-- requests: success=2, failed=0, total=2, rpm=24
[13:26:00] [INFO] 📐 Measuring dataset column statistics:
[13:26:00] [INFO]   |-- 📝 column: 'description'
[13:26:00] [INFO] 🥳 Preview complete!
Python
1# Run this cell multiple times to cycle through the 2 preview records.
2preview.display_sample_record()
3
Output
[index: 0]
                                                                                                              
                                                 Seed Columns                                                 
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Name          Value                                                                                       ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ uuid         │ 87f84627-9911-4344-9e18-07d39c8f36d1                                                        │
├──────────────┼─────────────────────────────────────────────────────────────────────────────────────────────┤
│ label        │ 0                                                                                           │
├──────────────┼─────────────────────────────────────────────────────────────────────────────────────────────┤
│ base64_image │ iVBORw0KGgoAAAANSUhEUgAAAeQAAAIACAIAAADc8YinAAEAAElEQVR4nOy9V5ckuZEmamZwEREpSna1YAv28JLDHT… │
└──────────────┴─────────────────────────────────────────────────────────────────────────────────────────────┘
                                                                                                              
                                                                                                              
                                              Generated Columns                                               
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Name         Value                                                                                        ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ description │ # Close-up Portrait of a Black and White Cat                                                 │
│             │                                                                                              │
│             │ ## Main Subject                                                                              │
│             │ The image features a close-up shot of a domestic cat, likely a tuxedo or bicolor breed. The  │
│             │ cat is positioned centrally, filling most of the frame from the chest up. It is looking      │
│             │ slightly upward and forward with an attentive, wide-eyed expression.                         │
│             │                                                                                              │
│             │ *   **Eyes:** The cat has large, round eyes that are a striking yellow-green color with      │
│             │ vertical black pupils. The gaze is intense and focused.                                      │
│             │ *   **Fur Pattern:** The fur is distinctly two-toned. The ears, the sides of the head, and   │
│             │ patches around the eyes are black. A broad stripe of white fur runs down the center of the   │
│             │ forehead, between the eyes, and covers the nose, mouth area, and chest.                      │
│             │ *   **Face:** The nose is small, black, and triangular. The mouth is closed in a neutral,    │
│             │ slightly downturned line, giving the cat a somewhat serious or curious look. Long, thin      │
│             │ white whiskers extend outward from the muzzle on both sides.                                 │
│             │ *   **Ears:** The ears are pointed and upright, indicating alertness. The inside of the ears │
│             │ shows some lighter fur mixed with black.                                                     │
│             │                                                                                              │
│             │ ## Background                                                                                │
│             │ The background is simple and out of focus, which helps emphasize the cat as the main         │
│             │ subject.                                                                                     │
│             │ *   **Left/Top:** A plain, light-colored wall (appearing off-white or very light grey).      │
│             │ *   **Right:** A vertical section of a light brown, possibly wooden surface, likely a door   │
│             │ frame or furniture edge.                                                                     │
│             │                                                                                              │
│             │ ## Colors and Lighting                                                                       │
│             │ *   **Color Palette:** The dominant colors are black, white, and the yellow-green of the     │
│             │ eyes. The background introduces neutral tones of white/grey and tan/brown.                   │
│             │ *   **Lighting:** The lighting appears to be soft and diffuse, coming from the front. It     │
│             │ illuminates the cat's face evenly without creating harsh shadows, highlighting the texture   │
│             │ of the fur and the shine in the eyes.                                                        │
└─────────────┴──────────────────────────────────────────────────────────────────────────────────────────────┘
                                                                                                              
Python
1# The preview dataset is available as a pandas DataFrame.
2preview.dataset
3
Output
uuid label base64_image description
0 87f84627-9911-4344-9e18-07d39c8f36d1 0 iVBORw0KGgoAAAANSUhEUgAAAeQAAAIACAIAAADc8YinAA... # Close-up Portrait of a Black and White Cat\n...
1 c8ae8bad-9f5b-40fc-b292-3662b5a9d742 0 iVBORw0KGgoAAAANSUhEUgAAAiQAAAIACAIAAAA9rOAHAA... # Detailed Description of the Image\n\n**Main ...

📊 Analyze the generated data

  • Data Designer automatically generates a basic statistical analysis of the generated data.

  • This analysis is available via the analysis property of generation result objects.

Python
1# Print the analysis as a table.
2preview.analysis.to_report()
3
Output
──────────────────────────────────────── 🎨 Data Designer Dataset Profile ─────────────────────────────────────────

                                                                                                                   
                                                 Dataset Overview                                                  
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ number of records                number of columns                percent complete records                    ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 2                               │ 1                               │ 100.0%                                      │
└─────────────────────────────────┴─────────────────────────────────┴─────────────────────────────────────────────┘
                                                                                                                   
                                                                                                                   
                                                📝 LLM-Text Columns                                                
┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃                                                                      prompt tokens        completion tokens ┃
┃ column name           data type          number unique values           per record               per record ┃
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ description      │        string │                   2 (100.0%) │        29.0 +/- 0.0 │          383.0 +/- 25.5 │
└──────────────────┴───────────────┴──────────────────────────────┴─────────────────────┴─────────────────────────┘
                                                                                                                   
                                                                                                                   
╭────────────────────────────────────────────────── Table Notes ──────────────────────────────────────────────────╮
                                                                                                                 
  1. All token statistics are based on a sample of max(1000, len(dataset)) records.                              
  2. Tokens are calculated using tiktoken's cl100k_base tokenizer.                                               
                                                                                                                 
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
                                                                                                                   
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────

🔎 Visual Inspection

Let's compare the original image with the generated description to validate quality:

Python
1# Compare original image with generated description
2index = 0 # Change this to view different examples
3
4# Merge preview data with original images for comparison
5comparison_dataset = preview.dataset.merge(pd.DataFrame(img_dataset)[["uuid", "image"]], how="left", on="uuid")
6
7# Extract the record for display
8record = comparison_dataset.iloc[index]
9
10print("📄 Original Image:")
11display(resize_image(record.image, BASE64_IMAGE_HEIGHT))
12
13print("\n📝 Generated Description:")
14rich.print(Panel(record.description, title="Image Description", title_align="left"))
15
Output
📄 Original Image:
Output

📝 Generated Description:
╭─ Image Description ─────────────────────────────────────────────────────────────────────────────────────────────╮
│ # Close-up Portrait of a Black and White Cat                                                                    │
│                                                                                                                 │
│ ## Main Subject                                                                                                 │
│ The image features a close-up shot of a domestic cat, likely a tuxedo or bicolor breed. The cat is positioned   │
│ centrally, filling most of the frame from the chest up. It is looking slightly upward and forward with an       │
│ attentive, wide-eyed expression.                                                                                │
│                                                                                                                 │
│ *   **Eyes:** The cat has large, round eyes that are a striking yellow-green color with vertical black pupils.  │
│ The gaze is intense and focused.                                                                                │
│ *   **Fur Pattern:** The fur is distinctly two-toned. The ears, the sides of the head, and patches around the   │
│ eyes are black. A broad stripe of white fur runs down the center of the forehead, between the eyes, and covers  │
│ the nose, mouth area, and chest.                                                                                │
│ *   **Face:** The nose is small, black, and triangular. The mouth is closed in a neutral, slightly downturned   │
│ line, giving the cat a somewhat serious or curious look. Long, thin white whiskers extend outward from the      │
│ muzzle on both sides.                                                                                           │
│ *   **Ears:** The ears are pointed and upright, indicating alertness. The inside of the ears shows some lighter │
│ fur mixed with black.                                                                                           │
│                                                                                                                 │
│ ## Background                                                                                                   │
│ The background is simple and out of focus, which helps emphasize the cat as the main subject.                   │
│ *   **Left/Top:** A plain, light-colored wall (appearing off-white or very light grey).                         │
│ *   **Right:** A vertical section of a light brown, possibly wooden surface, likely a door frame or furniture   │
│ edge.                                                                                                           │
│                                                                                                                 │
│ ## Colors and Lighting                                                                                          │
│ *   **Color Palette:** The dominant colors are black, white, and the yellow-green of the eyes. The background   │
│ introduces neutral tones of white/grey and tan/brown.                                                           │
│ *   **Lighting:** The lighting appears to be soft and diffuse, coming from the front. It illuminates the cat's  │
│ face evenly without creating harsh shadows, highlighting the texture of the fur and the shine in the eyes.      │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

🆙 Scale up!

  • Happy with your preview data?

  • Use the create method to submit larger Data Designer generation jobs.

Python
1results = data_designer.create(config_builder, num_records=10, dataset_name="tutorial-4")
2
Output
[13:26:00] [INFO] 🎨 Creating Data Designer dataset
[13:26:00] [INFO]   |-- 🔒 Jinja rendering engine: secure
[13:26:00] [INFO] ✅ Validation passed
[13:26:00] [INFO] ⛓️ Sorting column configs into a Directed Acyclic Graph
[13:26:00] [INFO] 🩺 Running health checks for models...
[13:26:00] [INFO]   |-- 👀 Checking 'nvidia/nemotron-3-nano-omni-30b-a3b-reasoning' in provider named 'nvidia' for model alias 'nvidia-vision'...
[13:26:00] [INFO]   |-- ✅ Passed!
[13:26:00] [INFO] ⚡ DATA_DESIGNER_ASYNC_ENGINE is enabled - using async task-queue builder
[13:26:00] [INFO] 📝 llm-text model config for column 'description'
[13:26:00] [INFO]   |-- model: 'nvidia/nemotron-3-nano-omni-30b-a3b-reasoning'
[13:26:00] [INFO]   |-- model alias: 'nvidia-vision'
[13:26:00] [INFO]   |-- model provider: 'nvidia'
[13:26:00] [INFO]   |-- inference parameters:
[13:26:00] [INFO]   |  |-- generation_type=chat-completion
[13:26:00] [INFO]   |  |-- max_parallel_requests=4
[13:26:00] [INFO]   |  |-- temperature=0.60
[13:26:00] [INFO]   |  |-- top_p=0.95
[13:26:00] [INFO] ⚡️ Async generation: 1 column(s) (description), 10 tasks across 1 row group(s)
[13:26:00] [INFO] 🚀 (1/1) Dispatching with 10 records
[13:26:00] [INFO] 🌱 (1/1) Sampling 10 records from seed dataset
[13:26:00] [INFO]   |-- seed dataset size: 512 records
[13:26:00] [INFO]   |-- sampling strategy: ordered
[13:26:05] [INFO] 📊 Progress [5.1s]:
[13:26:05] [INFO]   |-- 🌦️ description: 3/10 (30%) 0.6 rec/s
[13:26:11] [INFO] 📊 Progress [10.5s]:
[13:26:11] [INFO]   |-- ⛅ description: 6/10 (60%) 0.6 rec/s
[13:26:16] [INFO] 📊 Progress [15.5s]:
[13:26:16] [INFO]   |-- ☀️ description: 10/10 (100%) 0.6 rec/s
[13:26:16] [INFO] ✅ Async generation complete [15.6s]: 10 ok, 0 failed across 1 column(s)
[13:26:16] [INFO] 📊 Model usage summary:
[13:26:16] [INFO]   |-- model: nvidia/nemotron-3-nano-omni-30b-a3b-reasoning
[13:26:16] [INFO]   |-- tokens: input=3720, output=9214, reasoning=5820 (estimated), total=12934, tps=822
[13:26:16] [INFO]   |-- reasoning token count estimated with tiktoken
[13:26:16] [INFO]   |-- requests: success=10, failed=0, total=10, rpm=38
[13:26:16] [INFO] 📐 Measuring dataset column statistics:
[13:26:16] [INFO]   |-- 📝 column: 'description'
Python
1# Load the generated dataset as a pandas DataFrame.
2dataset = results.load_dataset()
3
4dataset.head()
5
Output
uuid label base64_image description
0 87f84627-9911-4344-9e18-07d39c8f36d1 0 iVBORw0KGgoAAAANSUhEUgAAAeQAAAIACAIAAADc8YinAA... # Detailed Description ## Main Subject The im...
1 c8ae8bad-9f5b-40fc-b292-3662b5a9d742 0 iVBORw0KGgoAAAANSUhEUgAAAiQAAAIACAIAAAA9rOAHAA... # Detailed Description ## Main Subject The pr...
2 85eb94d7-a00d-436a-a75c-f3a867b6c64c 0 iVBORw0KGgoAAAANSUhEUgAAAqoAAAIACAIAAADFYNm1AA... # Cat on Wooden Floor ## Main Subject The pri...
3 a4095d29-0b51-4c1c-a3be-003f66f3dc1b 0 iVBORw0KGgoAAAANSUhEUgAAAwAAAAIACAIAAAC6lJxtAA... # Cat in a Green Container ## Main Subject Th...
4 7db77bef-1bca-4d57-babd-6426ff5632af 0 iVBORw0KGgoAAAANSUhEUgAAAqoAAAIACAIAAADFYNm1AA... Based on the image provided, here is a detaile...
Python
1# Load the analysis results into memory.
2analysis = results.load_analysis()
3
4analysis.to_report()
5
Output
──────────────────────────────────────── 🎨 Data Designer Dataset Profile ─────────────────────────────────────────

                                                                                                                   
                                                 Dataset Overview                                                  
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ number of records                number of columns                percent complete records                    ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 10                              │ 1                               │ 100.0%                                      │
└─────────────────────────────────┴─────────────────────────────────┴─────────────────────────────────────────────┘
                                                                                                                   
                                                                                                                   
                                                📝 LLM-Text Columns                                                
┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃                                                                      prompt tokens        completion tokens ┃
┃ column name           data type          number unique values           per record               per record ┃
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ description      │        string │                  10 (100.0%) │        29.0 +/- 0.0 │          306.5 +/- 61.2 │
└──────────────────┴───────────────┴──────────────────────────────┴─────────────────────┴─────────────────────────┘
                                                                                                                   
                                                                                                                   
╭────────────────────────────────────────────────── Table Notes ──────────────────────────────────────────────────╮
                                                                                                                 
  1. All token statistics are based on a sample of max(1000, len(dataset)) records.                              
  2. Tokens are calculated using tiktoken's cl100k_base tokenizer.                                               
                                                                                                                 
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
                                                                                                                   
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────

⏭️ Next Steps

Now that you've learned how to use visual context for image summarization in Data Designer, explore more:

  • Experiment with different vision models for specific image types

  • Try different prompt variations to generate specialized descriptions (e.g., technical details, key findings)

  • Combine image, audio, or video context with other column types after confirming your selected model supports those modalities

  • Apply this pattern to other vision tasks like image captioning, OCR validation, or visual question answering

  • Generating images with Data Designer