Multi-Modal Context in Nemo Data Designer#

Data Designer supports multi-modal context, allowing you to incorporate images into your LLM-based column generation. This feature enables you to create a more sophisticated synthetic data generation pipeline by having vision enabled LLMs analyze and respond to visual content alongside text prompts.

Overview#

Multi-modal context injection allows you to reference image data from columns in your dataset when generating content with LLM-based columns. This is particularly useful for workflows that involve combining text and visual information:

  • Generating descriptions and captions of images

  • Generating question-answer pairs from images such as charts and tables for enterprise document intelligence

  • Creating content based on visual analysis

Image Context Configuration#

To use multi-modal context, you need to configure ImageContext objects with the following parameters:

ImageContext Configuration Parameters#

Parameter

Type

Required

Default

Description

column_name

str

Yes

 

The name of the column containing image data in your dataset

data_type

ModalityDataType

Yes

 

How the image is stored. Options: URL or BASE64

image_format

ImageFormat

No

None

The format of the image. Options: PNG, JPG, JPEG, gif, webp. Optional parameter

modality

Modality

No

Modality.IMAGE

The type of modality. Currently only "image" is supported

Image Data Types#

When your images are stored as URLs in your dataset:

from nemo_microservices.beta.data_designer.config import params as P

image_context = P.ImageContext(
    column_name="image_urls",
    data_type=P.ModalityDataType.URL
)

When your images are stored as base64-encoded strings in your dataset:

from nemo_microservices.beta.data_designer.config import params as P

image_context = P.ImageContext(
    column_name="image_data",
    data_type=P.ModalityDataType.BASE64,
    image_format=P.ImageFormat.PNG
)

Basic Example: Image Description Generation#

Here’s an example configuration that generates descriptions of images:

from nemo_microservices.beta.data_designer.config import columns as C
from nemo_microservices.beta.data_designer.config import params as P


# create the config_builder object
...

# Add a column with image URLs. Replace these below with images of your choice available over the internet.
config_builder.add_column(
    C.SamplerColumn(
        name="image_urls",
        type=P.SamplerType.CATEGORY,
        params=P.CategorySamplerParams(
            values=[
                "https://example.com/image1.jpg",
                "https://example.com/image2.jpg",
                "https://example.com/image3.jpg"
            ]
        )
    )
)

# Add LLM column that generates descriptions using image context
config_builder.add_column(
    C.LLMTextColumn(
        name="image_description",
        prompt="Describe this image in detail. Focus on the visual elements, colors, composition, and any objects or scenes you can identify.",
        model_alias="vision_model",
        multi_modal_context=[
            P.ImageContext(
                column_name="image_urls",
                data_type=P.ModalityDataType.URL
            )
        ]
    )
)

# Generate the data
preview = data_designer_client.preview(config_builder)
preview.display_sample_record()

Working with Base64 Images from Seed Datasets#

A more practical approach is to load images from a local directory, encode them as base64, and use them as a seed dataset. This allows you to work with your own image collections.

Loading Images from Directory#

Here’s how to create a seed dataset with base64-encoded images:

import base64
import io
import pandas as pd
from pathlib import Path
from PIL import Image


def create_image_dataset(image_directory: str, output_parquet="image_dataset.parquet") -> None:
    """Create a Parquet dataset from images in a directory, converting all to PNG format."""
    image_dir = Path(image_directory)
    image_files = list(image_dir.glob("*.jpg")) + list(image_dir.glob("*.png")) + list(image_dir.glob("*.jpeg"))
    
    data = []
    for img_path in image_files:
        try:
            # Open image with PIL and convert to PNG
            with Image.open(img_path) as img:
                # Convert to RGB if necessary (PNG doesn't support all modes)
                if img.mode in ('RGBA', 'LA', 'P'):
                    img = img.convert('RGB')
                buffer = io.BytesIO()
                img.save(buffer, format='PNG')
                image_bytes = buffer.getvalue()
                base64_data = base64.b64encode(image_bytes).decode('utf-8')
            
            data.append({
                "image_filename": img_path.name,
                "image_path": str(img_path),
                "image_base64": base64_data,
                "image_format": "png"
            })
        except Exception as e:
            print(f"Error processing {img_path}: {e}")
    
    df = pd.DataFrame(data)
    df.to_parquet(output_parquet, index=False)

# Create the dataset
image_dataset = create_image_dataset("./images")
print(f"Created dataset with {len(image_dataset)} images")
print(image_dataset.head())

Using the Seed Dataset with Multi-Modal Context#

Now you can use the image dataset created above as a seed for Data Designer:

from nemo_microservices.beta.data_designer.config import columns as C
from nemo_microservices.beta.data_designer.config import params as P

# create the config builder object
...

# Load the seed dataset with base64 images
config_builder.with_seed_dataset(
    repo_id="sample/image-dataset",
    dataset_path="image_dataset.parquet",
    sampling_strategy="shuffle",
    with_replacement=True,
    datastore={"endpoint": "http://localhost:3000/v1/hf"}
)

# Add LLM column that generates descriptions using the base64 images
config_builder.add_column(
    C.LLMTextColumn(
        name="image_description",
        prompt="Analyze this image and provide a detailed description. Focus on the visual elements, colors, composition, and any objects or scenes you can identify.",
        model_alias="vision_model",
        multi_modal_context=[
            P.ImageContext(
                column_name="image_base64",
                data_type=P.ModalityDataType.BASE64,
                image_format=P.ImageFormat.PNG
                )
        ]
    )
)

# Generate the data
preview = data_designer_client.preview(config_builder)
preview.display_sample_record()

Best Practices#

Model Selection#

Ensure you’re using a vision-capable model that can process images. Common vision-capable models include:

  • mistralai/mistral-medium-3-instruct

  • meta/llama-3.2-90b-vision-instruct

  • meta/llama-4-maverick-17b-128e-instruct

Image Format Considerations#

  • For URL-based images, ensure the URLs are accessible from where data designer and the models are running.

  • The base64 data must be properly encoded and match the specified format.

Performance Considerations#

  • Vision models typically have higher latency than text-only models.

  • Consider the size, complexity, clarity of images in your dataset.

  • Multiple images in a single context will increase processing time.